Journal utilization is too high - process buffer 100%

Our systems is a single node with about 1500 msg/s. System has been stable for the past 35 days since we created it. However 4 days ago our system process buffer filled up, as well as the disk journal. When this happens log output rates are way down, around the 20-100 mgs/s out. And stay that way until the server is rebooted. Once rebooted the systems will output around 4000-15000 msg/s till journal is empty then will maintain that until system breaks again.

While the system was stable for the past 35ish days, we have now seen a process buffer fill-up almost every 24 hours. This while there has been no apparent changes in logging in our environment, nor changes to the server itself.

If we shutoff all inputs, and let the buffer catchup till empty, then start up the inputs, it will continue at low output rate (20-100msg/s), input buffer will almost immediately fill-up and then disk journal will fill-up.

We’ve tried restarting the graylog and elasticsearch services to see if this will increase the speed of the output. However, thus far nothing except a full reboot works to getting it back to speed.

We have never been able to catch this when it happens so we are basing some of information on when it happens based on the alerts in graylog. Last time it errored we got the following errors:

  1. Nodes with too long GC pauses (triggered 18 hours ago)
    There are Graylog nodes on which the garbage collector runs too long. Garbage collection runs should be as short as possible. Please check whether those nodes are healthy. (Node: 602a0297-afdf-49ce-83aa-7b5b141aee1d , GC duration: 1379 ms , GC threshold: 1000 ms )

  2. Journal utilization is too high (triggered 15 hours ago)
    Journal utilization is too high and may go over the limit soon. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit. (Node: 602a0297-afdf-49ce-83aa-7b5b141aee1d )

  3. Uncommited messages deleted from journal (triggered 15 hours ago)
    Some messages were deleted from the Graylog journal before they could be written to Elasticsearch. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit. (Node: 602a0297-afdf-49ce-83aa-7b5b141aee1d )

Best guess is that there is some service maybe on elasticsearch that runs, at a certain time, and causes they system to not perform as it should. Again, this is just a guess. And even then, we haven’t seen a consistent time in which this happens.

Ubuntu 20.04.3 LTS
AWS plugins 4.2.5
Collector 4.2.5
Elasticsearch 6 Support 4.2.5+59802bf
Elasticsearch 7 Support 4.2.5+59802bf
Enterprise Integrations 4.2.5
Graylog Enterprise 4.2.5
Graylog Enterprise (ES6 Support)
Graylog Enterprise (ES7 Support)
Integrations 4.2.5
Threat Intelligence Plugin 4.2.5

Hello @Chase

I see there has been some issues in the past with the journal.

Ill try to explain those messages above.

You may find that answer here for that log message.

Seams like you having issues with Elasticsearch, I would check you status/health of your Elasticsearch.

curl -XGET http://localhost:9200/_cluster/health?pretty=true

Knowing what your Graylog and elasticsearch configurations look like, I might be able to help further.

When your journal get to full this will happen, hence something is wrong with Elasticsearch. Since Elasticsearch grabs the messages from the journal and indices them. that would be the first place I would look, especially the logs. You maybe having a problem all this time but it takes a few days to notice. No need to reboot all your doing is restarting the services and perhaps cleaning out the journal.
To be honest I would go over all you logs /var/log to find if anything could pertain to this issue. If you running a load balancer ( i.e. nginx/apache) I would also check those logs.
What version are you running?

  • Elasticsearch
  • Graylog
  • MongoDb

This also could be a direct results with resources and distributions of resources.
It possible Graylog HDD is getting full and Elasticsearch stops index message in the journal, hence filling up until you reboot.

root # df -h

If you can try restarting Graylog service and tail its log file

systemctl restart graylog-server

and

tail -f /avr/log/graylog/server.log

Watch how Graylog starts up and check for issues, just a thought you may find something.

Issues in the past, were related to the output process being overloaded. This time it appears to be the process buffer. We ended up wiping the old system and reinstalled. As mentioned this one had been working for 30+ days, and they began having this issue.

Things that were changed around that time on Graylog system are:

Added 3 new inputs/streams/indicies (not high usage)

Added UFW policies to NAT general syslog traffic (currently disabled)

Allowed wildcard searches in graylog system file

Changed search query time from 60s to 120s. (set back to default)

One of the things that I really don’t understand is that if I restart elasticsearch or graylog, or both, even if the process buffer is caught up, it still won’t fix the output rate. In order for that to be resolved I have to reboot the system.

image002.jpg

(EVEN AFTER REBOOT)

curl -XGET http://127.0.0.1:9200/_cluster/health?pretty=true

{

"cluster_name" : "graylog",

"status" : "green",

"timed_out" : false,

"number_of_nodes" : 1,

"number_of_data_nodes" : 1,

"active_primary_shards" : 421,

"active_shards" : 421,

"relocating_shards" : 0,

"initializing_shards" : 0,

"unassigned_shards" : 0,

"delayed_unassigned_shards" : 0,

"number_of_pending_tasks" : 0,

"number_of_in_flight_fetch" : 0,

"task_max_waiting_in_queue_millis" : 0,

"active_shards_percent_as_number" : 100.0

}

graylog-admin@graylog:~$ curl -XGET http://127.0.0.1:9200/_cat/shards


server_index_53 1 p STARTED 18854739 8.5gb 127.0.0.1 graylog

server_index_53 4 p STARTED 18861036 8.5gb 127.0.0.1 graylog

server_index_53 3 p STARTED 18859414 8.5gb 127.0.0.1 graylog

server_index_53 5 p STARTED 18864089 8.5gb 127.0.0.1 graylog

server_index_53 2 p STARTED 18863774 8.5gb 127.0.0.1 graylog

server_index_53 0 p STARTED 18851798 8.5gb 127.0.0.1 graylog

desktops_index_52 4 p STARTED 2800638 1.8gb 127.0.0.1 graylog

desktops_index_52 3 p STARTED 2800671 1.8gb 127.0.0.1 graylog

desktops_index_52 2 p STARTED 2800856 1.8gb 127.0.0.1 graylog

desktops_index_52 1 p STARTED 2802087 1.8gb 127.0.0.1 graylog

desktops_index_52 0 p STARTED 2799963 1.8gb 127.0.0.1 graylog

printer-index_0 1 p STARTED 17198 2.8mb 127.0.0.1 graylog

printer-index_0 0 p STARTED 17272 2.8mb 127.0.0.1 graylog

server_index_40 1 p STARTED 18896734 7.7gb 127.0.0.1 graylog

server_index_40 4 p STARTED 18895320 7.7gb 127.0.0.1 graylog

server_index_40 3 p STARTED 18893776 7.7gb 127.0.0.1 graylog

server_index_40 2 p STARTED 18889795 7.7gb 127.0.0.1 graylog

server_index_40 5 p STARTED 18890643 7.7gb 127.0.0.1 graylog

server_index_40 0 p STARTED 18890726 7.7gb 127.0.0.1 graylog

gl-system-events_0 3 p STARTED 0 208b 127.0.0.1 graylog

gl-system-events_0 2 p STARTED 0 208b 127.0.0.1 graylog

gl-system-events_0 1 p STARTED 0 208b 127.0.0.1 graylog

gl-system-events_0 0 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_60 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_60 0 p STARTED 0 208b 127.0.0.1 graylog

server_index_35 1 p STARTED 19655025 8gb 127.0.0.1 graylog

server_index_35 4 p STARTED 19647694 8gb 127.0.0.1 graylog

server_index_35 3 p STARTED 19647647 8gb 127.0.0.1 graylog

server_index_35 2 p STARTED 19655138 8gb 127.0.0.1 graylog

server_index_35 5 p STARTED 19653215 8gb 127.0.0.1 graylog

server_index_35 0 p STARTED 19659304 8gb 127.0.0.1 graylog

gl-failures_56 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_56 0 p STARTED 0 208b 127.0.0.1 graylog

server_index_45 2 p STARTED 20269275 8.9gb 127.0.0.1 graylog

server_index_45 4 p STARTED 20279580 8.9gb 127.0.0.1 graylog

server_index_45 5 p STARTED 20272558 8.9gb 127.0.0.1 graylog

server_index_45 3 p STARTED 20269030 8.9gb 127.0.0.1 graylog

server_index_45 1 p STARTED 20280630 8.9gb 127.0.0.1 graylog

server_index_45 0 p STARTED 20271005 8.9gb 127.0.0.1 graylog

firewall_index_21 3 p STARTED 20148067 14.6gb 127.0.0.1 graylog

firewall_index_21 1 p STARTED 20141511 14.6gb 127.0.0.1 graylog

firewall_index_21 2 p STARTED 20147350 14.7gb 127.0.0.1 graylog

firewall_index_21 0 p STARTED 20177749 14.9gb 127.0.0.1 graylog

server_index_48 5 p STARTED 19227719 8.2gb 127.0.0.1 graylog

server_index_48 4 p STARTED 19235280 8.2gb 127.0.0.1 graylog

server_index_48 3 p STARTED 19229499 8.2gb 127.0.0.1 graylog

server_index_48 2 p STARTED 19231227 8.2gb 127.0.0.1 graylog

server_index_48 1 p STARTED 19231970 8.2gb 127.0.0.1 graylog

server_index_48 0 p STARTED 19230234 8.2gb 127.0.0.1 graylog

desktops_index_59 4 p STARTED 3398177 2.1gb 127.0.0.1 graylog

desktops_index_59 3 p STARTED 3396466 2.1gb 127.0.0.1 graylog

desktops_index_59 2 p STARTED 3398971 2.1gb 127.0.0.1 graylog

desktops_index_59 1 p STARTED 3399637 2.1gb 127.0.0.1 graylog

desktops_index_59 0 p STARTED 3400865 2.1gb 127.0.0.1 graylog

server_index_54 2 p STARTED 18873529 8gb 127.0.0.1 graylog

server_index_54 4 p STARTED 18861196 8gb 127.0.0.1 graylog

server_index_54 3 p STARTED 18866234 8gb 127.0.0.1 graylog

server_index_54 5 p STARTED 18864243 8gb 127.0.0.1 graylog

server_index_54 1 p STARTED 18862493 8gb 127.0.0.1 graylog

server_index_54 0 p STARTED 18869502 8gb 127.0.0.1 graylog

webproxy_index_8 1 p STARTED 1633623 879.3mb 127.0.0.1 graylog

webproxy_index_8 0 p STARTED 1630585 877.3mb 127.0.0.1 graylog

firewall_index_14 3 p STARTED 17642774 13.5gb 127.0.0.1 graylog

firewall_index_14 1 p STARTED 17642553 13.5gb 127.0.0.1 graylog

firewall_index_14 2 p STARTED 17647181 13.5gb 127.0.0.1 graylog

firewall_index_14 0 p STARTED 17646903 13.5gb 127.0.0.1 graylog

gl-events_0 3 p STARTED 0 208b 127.0.0.1 graylog

gl-events_0 1 p STARTED 0 208b 127.0.0.1 graylog

gl-events_0 2 p STARTED 0 208b 127.0.0.1 graylog

gl-events_0 0 p STARTED 0 208b 127.0.0.1 graylog

server_index_61 5 p STARTED 12863818 6.2gb 127.0.0.1 graylog

server_index_61 4 p STARTED 12868628 6.2gb 127.0.0.1 graylog

server_index_61 3 p STARTED 12865978 6.2gb 127.0.0.1 graylog

server_index_61 1 p STARTED 12868388 6.2gb 127.0.0.1 graylog

server_index_61 2 p STARTED 12859931 6.2gb 127.0.0.1 graylog

server_index_61 0 p STARTED 12866053 6.2gb 127.0.0.1 graylog

application_index_9 1 p STARTED 13703 3.5mb 127.0.0.1 graylog

application_index_9 0 p STARTED 13511 3.5mb 127.0.0.1 graylog

gl-failures_61 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_61 0 p STARTED 0 208b 127.0.0.1 graylog

server_index_56 5 p STARTED 18380370 8.3gb 127.0.0.1 graylog

server_index_56 4 p STARTED 18378547 8.3gb 127.0.0.1 graylog

server_index_56 3 p STARTED 18386266 8.3gb 127.0.0.1 graylog

server_index_56 2 p STARTED 18377275 8.3gb 127.0.0.1 graylog

server_index_56 1 p STARTED 18381791 8.3gb 127.0.0.1 graylog

server_index_56 0 p STARTED 18384882 8.3gb 127.0.0.1 graylog

server_index_62 2 p STARTED 19145224 8.2gb 127.0.0.1 graylog

server_index_62 4 p STARTED 19146856 8.2gb 127.0.0.1 graylog

server_index_62 3 p STARTED 19147674 8.2gb 127.0.0.1 graylog

server_index_62 5 p STARTED 19150775 8.2gb 127.0.0.1 graylog

server_index_62 1 p STARTED 19151848 8.2gb 127.0.0.1 graylog

server_index_62 0 p STARTED 19147598 8.2gb 127.0.0.1 graylog

gl-system-events_1 3 p STARTED 0 208b 127.0.0.1 graylog

gl-system-events_1 1 p STARTED 0 208b 127.0.0.1 graylog

gl-system-events_1 2 p STARTED 0 208b 127.0.0.1 graylog

gl-system-events_1 0 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_52 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_52 0 p STARTED 0 208b 127.0.0.1 graylog

index_15 3 p STARTED 25176846 19.8gb 127.0.0.1 graylog

index_15 2 p STARTED 25162591 19.8gb 127.0.0.1 graylog

index_15 1 p STARTED 25177777 19.8gb 127.0.0.1 graylog

index_15 0 p STARTED 25169511 19.8gb 127.0.0.1 graylog

general_syslog_events_index_1 3 p STARTED 455546 84.3mb 127.0.0.1 graylog

general_syslog_events_index_1 1 p STARTED 454948 84mb 127.0.0.1 graylog

general_syslog_events_index_1 2 p STARTED 455372 78.1mb 127.0.0.1 graylog

general_syslog_events_index_1 0 p STARTED 455008 84mb 127.0.0.1 graylog

server_index_42 3 p STARTED 10549809 5.1gb 127.0.0.1 graylog

server_index_42 4 p STARTED 10545536 5.1gb 127.0.0.1 graylog

server_index_42 5 p STARTED 10550011 5.1gb 127.0.0.1 graylog

server_index_42 2 p STARTED 10547427 5.1gb 127.0.0.1 graylog

server_index_42 1 p STARTED 10548280 5.1gb 127.0.0.1 graylog

server_index_42 0 p STARTED 10547487 5.1gb 127.0.0.1 graylog

webproxy_index_9 1 p STARTED 23410 20.7mb 127.0.0.1 graylog

webproxy_index_9 0 p STARTED 23430 21.2mb 127.0.0.1 graylog

gl-failures_54 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_54 0 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_62 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_62 0 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_48 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_48 0 p STARTED 0 208b 127.0.0.1 graylog

webproxy_index_7 1 p STARTED 1246243 693mb 127.0.0.1 graylog

webproxy_index_7 0 p STARTED 1244087 690.8mb 127.0.0.1 graylog

gl-failures_44 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_44 0 p STARTED 0 208b 127.0.0.1 graylog

server_index_41 2 p STARTED 11125742 5.4gb 127.0.0.1 graylog

server_index_41 4 p STARTED 11136775 5.4gb 127.0.0.1 graylog

server_index_41 5 p STARTED 11137893 5.4gb 127.0.0.1 graylog

server_index_41 3 p STARTED 11132023 5.4gb 127.0.0.1 graylog

server_index_41 1 p STARTED 11132801 5.4gb 127.0.0.1 graylog

server_index_41 0 p STARTED 11129062 5.4gb 127.0.0.1 graylog

firewall_index_11 3 p STARTED 14232936 11gb 127.0.0.1 graylog

firewall_index_11 2 p STARTED 14238642 11gb 127.0.0.1 graylog

firewall_index_11 1 p STARTED 14236912 11gb 127.0.0.1 graylog

firewall_index_11 0 p STARTED 14241914 11gb 127.0.0.1 graylog

fortiweb_index_6 1 p STARTED 1094307 603.7mb 127.0.0.1 graylog

webproxy_index_6 0 p STARTED 1095158 602.6mb 127.0.0.1 graylog

wireless_index_2 0 p STARTED 1364161 259.7mb 127.0.0.1 graylog

desktops_index_62 4 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_62 3 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_62 1 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_62 2 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_62 0 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_47 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_47 0 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_58 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_58 0 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_51 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_51 0 p STARTED 0 208b 127.0.0.1 graylog

printer-index_2 1 p STARTED 1087 366.9kb 127.0.0.1 graylog

printer-index_2 0 p STARTED 1116 462.7kb 127.0.0.1 graylog

desktops_index_63 4 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_63 3 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_63 1 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_63 2 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_63 0 p STARTED 0 208b 127.0.0.1 graylog

webproxy_index_5 1 p STARTED 875701 488.9mb 127.0.0.1 graylog

webproxy_index_5 0 p STARTED 873425 487.8mb 127.0.0.1 graylog

gl-failures_40 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_40 0 p STARTED 0 208b 127.0.0.1 graylog

firewall_index_16 3 p STARTED 26446517 19.8gb 127.0.0.1 graylog

firewall_index_16 2 p STARTED 26436035 19.8gb 127.0.0.1 graylog

firewall_index_16 1 p STARTED 26441637 19.8gb 127.0.0.1 graylog

firewall_index_16 0 p STARTED 26450160 19.8gb 127.0.0.1 graylog

server_index_58 2 p STARTED 13756260 6.4gb 127.0.0.1 graylog

server_index_58 4 p STARTED 13756101 6.4gb 127.0.0.1 graylog

server_index_58 5 p STARTED 13754999 6.4gb 127.0.0.1 graylog

server_index_58 1 p STARTED 13761062 6.4gb 127.0.0.1 graylog

server_index_58 3 p STARTED 13756053 6.4gb 127.0.0.1 graylog

server_index_58 0 p STARTED 13751953 6.4gb 127.0.0.1 graylog

desktops_index_60 4 p STARTED 709460 479.7mb 127.0.0.1 graylog

desktops_index_60 3 p STARTED 712229 481.2mb 127.0.0.1 graylog

desktops_index_60 1 p STARTED 709469 480.1mb 127.0.0.1 graylog

desktops_index_60 2 p STARTED 708762 479.3mb 127.0.0.1 graylog

desktops_index_60 0 p STARTED 710104 480.1mb 127.0.0.1 graylog

gl-failures_35 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_35 0 p STARTED 0 208b 127.0.0.1 graylog

server_index_39 5 p STARTED 19708828 8.5gb 127.0.0.1 graylog

server_index_39 4 p STARTED 19708181 8.5gb 127.0.0.1 graylog

server_index_39 3 p STARTED 19703749 8.5gb 127.0.0.1 graylog

server_index_39 2 p STARTED 19704454 8.5gb 127.0.0.1 graylog

server_index_39 1 p STARTED 19704350 8.5gb 127.0.0.1 graylog

server_index_39 0 p STARTED 19703354 8.5gb 127.0.0.1 graylog

server_index_36 2 p STARTED 21321430 9gb 127.0.0.1 graylog

server_index_36 4 p STARTED 21321165 9gb 127.0.0.1 graylog

server_index_36 3 p STARTED 21322237 9gb 127.0.0.1 graylog

server_index_36 1 p STARTED 21327735 9gb 127.0.0.1 graylog

server_index_36 5 p STARTED 21323422 9gb 127.0.0.1 graylog

server_index_36 0 p STARTED 21315183 9gb 127.0.0.1 graylog

firewall_index_12 3 p STARTED 17308079 13.7gb 127.0.0.1 graylog

firewall_index_12 1 p STARTED 17305899 13.7gb 127.0.0.1 graylog

firewall_index_12 2 p STARTED 17311078 13.7gb 127.0.0.1 graylog

firewall_index_12 0 p STARTED 17307329 13.7gb 127.0.0.1 graylog

gl-failures_53 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_53 0 p STARTED 0 208b 127.0.0.1 graylog

printer-index_1 1 p STARTED 22668 3.7mb 127.0.0.1 graylog

printer-index_1 0 p STARTED 22456 3.7mb 127.0.0.1 graylog

server_index_43 1 p STARTED 11556417 5.8gb 127.0.0.1 graylog

server_index_43 4 p STARTED 11557797 5.8gb 127.0.0.1 graylog

server_index_43 5 p STARTED 11554238 5.8gb 127.0.0.1 graylog

server_index_43 2 p STARTED 11561495 5.8gb 127.0.0.1 graylog

server_index_43 3 p STARTED 11554424 5.8gb 127.0.0.1 graylog

server_index_43 0 p STARTED 11553047 5.8gb 127.0.0.1 graylog

graylog_0 3 p STARTED 289265 178.8mb 127.0.0.1 graylog

graylog_0 2 p STARTED 289700 180mb 127.0.0.1 graylog

graylog_0 1 p STARTED 289527 179.2mb 127.0.0.1 graylog

graylog_0 0 p STARTED 290464 180mb 127.0.0.1 graylog

server_index_52 2 p STARTED 18471548 8.8gb 127.0.0.1 graylog

server_index_52 4 p STARTED 18476092 8.8gb 127.0.0.1 graylog

server_index_52 5 p STARTED 18483264 8.8gb 127.0.0.1 graylog

server_index_52 1 p STARTED 18478846 8.8gb 127.0.0.1 graylog

server_index_52 3 p STARTED 18479146 8.8gb 127.0.0.1 graylog

server_index_52 0 p STARTED 18470197 8.8gb 127.0.0.1 graylog

gl-events_2 3 p STARTED 0 208b 127.0.0.1 graylog

gl-events_2 2 p STARTED 0 208b 127.0.0.1 graylog

gl-events_2 1 p STARTED 0 208b 127.0.0.1 graylog

gl-events_2 0 p STARTED 0 208b 127.0.0.1 graylog

application_index_11 1 p STARTED 689 258.6kb 127.0.0.1 graylog

application_index_11 0 p STARTED 628 270.7kb 127.0.0.1 graylog

firewall_index_10 3 p STARTED 16754324 13.3gb 127.0.0.1 graylog

firewall_index_10 1 p STARTED 16760094 13.3gb 127.0.0.1 graylog

firewall_index_10 2 p STARTED 16752797 13.3gb 127.0.0.1 graylog

firewall_index_10 0 p STARTED 16752775 13.3gb 127.0.0.1 graylog

gl-failures_37 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_37 0 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_54 4 p STARTED 3803102 2.5gb 127.0.0.1 graylog

desktops_index_54 3 p STARTED 3804900 2.5gb 127.0.0.1 graylog

desktops_index_54 1 p STARTED 3804676 2.5gb 127.0.0.1 graylog

desktops_index_54 2 p STARTED 3805024 2.5gb 127.0.0.1 graylog

desktops_index_54 0 p STARTED 3808254 2.5gb 127.0.0.1 graylog

gl-failures_34 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_34 0 p STARTED 0 208b 127.0.0.1 graylog

server_index_51 5 p STARTED 19673584 8.7gb 127.0.0.1 graylog

server_index_51 4 p STARTED 19669243 8.7gb 127.0.0.1 graylog

server_index_51 3 p STARTED 19673816 8.7gb 127.0.0.1 graylog

server_index_51 2 p STARTED 19671886 8.7gb 127.0.0.1 graylog

server_index_51 1 p STARTED 19675005 8.7gb 127.0.0.1 graylog

server_index_51 0 p STARTED 19674389 8.7gb 127.0.0.1 graylog

server_index_59 1 p STARTED 19543803 8.7gb 127.0.0.1 graylog

server_index_59 4 p STARTED 19538607 8.7gb 127.0.0.1 graylog

server_index_59 3 p STARTED 19544951 8.7gb 127.0.0.1 graylog

server_index_59 2 p STARTED 19548041 8.7gb 127.0.0.1 graylog

server_index_59 5 p STARTED 19533123 8.7gb 127.0.0.1 graylog

server_index_59 0 p STARTED 19540908 8.7gb 127.0.0.1 graylog

gl-failures_42 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_42 0 p STARTED 0 208b 127.0.0.1 graylog

server_index_46 3 p STARTED 20086049 8.9gb 127.0.0.1 graylog

server_index_46 4 p STARTED 20090774 8.9gb 127.0.0.1 graylog

server_index_46 5 p STARTED 20090128 8.9gb 127.0.0.1 graylog

server_index_46 1 p STARTED 20089418 8.9gb 127.0.0.1 graylog

server_index_46 2 p STARTED 20091282 8.9gb 127.0.0.1 graylog

server_index_46 0 p STARTED 20089176 8.9gb 127.0.0.1 graylog

gl-failures_50 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_50 0 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_53 4 p STARTED 3048541 1.8gb 127.0.0.1 graylog

desktops_index_53 3 p STARTED 3050157 1.8gb 127.0.0.1 graylog

desktops_index_53 2 p STARTED 3049317 1.8gb 127.0.0.1 graylog

desktops_index_53 1 p STARTED 3052186 1.8gb 127.0.0.1 graylog

desktops_index_53 0 p STARTED 3049334 1.8gb 127.0.0.1 graylog

desktops_index_50 4 p STARTED 3043513 1.9gb 127.0.0.1 graylog

desktops_index_50 3 p STARTED 3045811 1.9gb 127.0.0.1 graylog

desktops_index_50 1 p STARTED 3046979 1.9gb 127.0.0.1 graylog

desktops_index_50 2 p STARTED 3046019 1.9gb 127.0.0.1 graylog

desktops_index_50 0 p STARTED 3046908 1.9gb 127.0.0.1 graylog

firewall_index_17 3 p STARTED 26505317 20.2gb 127.0.0.1 graylog

firewall_index_17 1 p STARTED 26500209 20.2gb 127.0.0.1 graylog

firewall_index_17 2 p STARTED 26511765 20.2gb 127.0.0.1 graylog

firewall_index_17 0 p STARTED 26500113 20.2gb 127.0.0.1 graylog

firewall_index_18 3 p STARTED 19877194 15.2gb 127.0.0.1 graylog

firewall_index_18 2 p STARTED 19880936 15.2gb 127.0.0.1 graylog

firewall_index_18 1 p STARTED 19880899 15.2gb 127.0.0.1 graylog

firewall_index_18 0 p STARTED 19889843 15.2gb 127.0.0.1 graylog

switches_index_2 1 p STARTED 103929 19.9mb 127.0.0.1 graylog

switches_index_2 0 p STARTED 103329 19.8mb 127.0.0.1 graylog

server_index_49 5 p STARTED 19468858 8.6gb 127.0.0.1 graylog

server_index_49 4 p STARTED 19471756 8.6gb 127.0.0.1 graylog

server_index_49 3 p STARTED 19474459 8.6gb 127.0.0.1 graylog

server_index_49 2 p STARTED 19468763 8.6gb 127.0.0.1 graylog

server_index_49 1 p STARTED 19472435 8.6gb 127.0.0.1 graylog

server_index_49 0 p STARTED 19463461 8.6gb 127.0.0.1 graylog

gl-failures_46 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_46 0 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_57 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_57 0 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_57 4 p STARTED 3351964 2.1gb 127.0.0.1 graylog

desktops_index_57 3 p STARTED 3354207 2.1gb 127.0.0.1 graylog

desktops_index_57 1 p STARTED 3352583 2.1gb 127.0.0.1 graylog

desktops_index_57 2 p STARTED 3354291 2.1gb 127.0.0.1 graylog

desktops_index_57 0 p STARTED 3353639 2.1gb 127.0.0.1 graylog

server_index_47 3 p STARTED 20118921 8.5gb 127.0.0.1 graylog

server_index_47 4 p STARTED 20126078 8.5gb 127.0.0.1 graylog

server_index_47 5 p STARTED 20115298 8.5gb 127.0.0.1 graylog

server_index_47 2 p STARTED 20126097 8.5gb 127.0.0.1 graylog

server_index_47 1 p STARTED 20120746 8.5gb 127.0.0.1 graylog

server_index_47 0 p STARTED 20122660 8.5gb 127.0.0.1 graylog

gl-failures_36 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_36 0 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_41 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_41 0 p STARTED 0 208b 127.0.0.1 graylog

firewall_index_19 3 p STARTED 29081239 21.8gb 127.0.0.1 graylog

firewall_index_19 2 p STARTED 29076101 21.8gb 127.0.0.1 graylog

firewall_index_19 1 p STARTED 29081681 21.8gb 127.0.0.1 graylog

firewall_index_19 0 p STARTED 29088476 21.8gb 127.0.0.1 graylog

gl-failures_45 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_45 0 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_55 4 p STARTED 2746074 1.7gb 127.0.0.1 graylog

desktops_index_55 3 p STARTED 2742473 1.7gb 127.0.0.1 graylog

desktops_index_55 2 p STARTED 2745658 1.7gb 127.0.0.1 graylog

desktops_index_55 1 p STARTED 2746723 1.7gb 127.0.0.1 graylog

desktops_index_55 0 p STARTED 2745554 1.7gb 127.0.0.1 graylog

server_index_57 3 p STARTED 14831753 7.3gb 127.0.0.1 graylog

server_index_57 4 p STARTED 14840029 7.4gb 127.0.0.1 graylog

server_index_57 5 p STARTED 14832295 7.4gb 127.0.0.1 graylog

server_index_57 1 p STARTED 14826666 7.3gb 127.0.0.1 graylog

server_index_57 2 p STARTED 14836551 7.4gb 127.0.0.1 graylog

server_index_57 0 p STARTED 14833131 7.4gb 127.0.0.1 graylog

desktops_index_56 4 p STARTED 2379000 1.5gb 127.0.0.1 graylog

desktops_index_56 3 p STARTED 2382255 1.5gb 127.0.0.1 graylog

desktops_index_56 1 p STARTED 2383663 1.5gb 127.0.0.1 graylog

desktops_index_56 2 p STARTED 2381287 1.5gb 127.0.0.1 graylog

desktops_index_56 0 p STARTED 2382192 1.5gb 127.0.0.1 graylog

server_index_44 3 p STARTED 11972807 6.1gb 127.0.0.1 graylog

server_index_44 4 p STARTED 11972560 6.1gb 127.0.0.1 graylog

server_index_44 5 p STARTED 11977316 6.1gb 127.0.0.1 graylog

server_index_44 2 p STARTED 11970387 6.1gb 127.0.0.1 graylog

server_index_44 1 p STARTED 11981659 6.1gb 127.0.0.1 graylog

server_index_44 0 p STARTED 11976977 6.1gb 127.0.0.1 graylog

firewall_index_13 3 p STARTED 20143329 15.9gb 127.0.0.1 graylog

firewall_index_13 1 p STARTED 20147140 15.9gb 127.0.0.1 graylog

firewall_index_13 2 p STARTED 20144515 15.9gb 127.0.0.1 graylog

firewall_index_13 0 p STARTED 20147502 15.9gb 127.0.0.1 graylog

gl-events_1 3 p STARTED 0 208b 127.0.0.1 graylog

gl-events_1 1 p STARTED 0 208b 127.0.0.1 graylog

gl-events_1 2 p STARTED 0 208b 127.0.0.1 graylog

gl-events_1 0 p STARTED 0 208b 127.0.0.1 graylog

firewall_index_20 3 p STARTED 23206149 18.2gb 127.0.0.1 graylog

firewall_index_20 1 p STARTED 23198215 18.2gb 127.0.0.1 graylog

firewall_index_20 2 p STARTED 23203197 18.2gb 127.0.0.1 graylog

firewall_index_20 0 p STARTED 23198658 18.2gb 127.0.0.1 graylog

server_index_55 5 p STARTED 18052373 7.7gb 127.0.0.1 graylog

server_index_55 4 p STARTED 18053413 7.7gb 127.0.0.1 graylog

server_index_55 3 p STARTED 18053357 7.7gb 127.0.0.1 graylog

server_index_55 1 p STARTED 18055742 7.7gb 127.0.0.1 graylog

server_index_55 2 p STARTED 18061959 7.7gb 127.0.0.1 graylog

server_index_55 0 p STARTED 18053170 7.7gb 127.0.0.1 graylog

desktops_index_51 4 p STARTED 3083708 1.9gb 127.0.0.1 graylog

desktops_index_51 3 p STARTED 3085780 1.9gb 127.0.0.1 graylog

desktops_index_51 2 p STARTED 3086304 1.9gb 127.0.0.1 graylog

desktops_index_51 1 p STARTED 3086158 1.9gb 127.0.0.1 graylog

desktops_index_51 0 p STARTED 3083736 1.9gb 127.0.0.1 graylog

server_index_63 1 p STARTED 3108015 1.7gb 127.0.0.1 graylog

server_index_63 4 p STARTED 3112114 1.7gb 127.0.0.1 graylog

server_index_63 5 p STARTED 3113361 1.7gb 127.0.0.1 graylog

server_index_63 2 p STARTED 3111143 1.9gb 127.0.0.1 graylog

server_index_63 3 p STARTED 3111539 1.8gb 127.0.0.1 graylog

server_index_63 0 p STARTED 3108993 1.7gb 127.0.0.1 graylog

server_index_50 2 p STARTED 20495344 9.1gb 127.0.0.1 graylog

server_index_50 4 p STARTED 20489925 9gb 127.0.0.1 graylog

server_index_50 3 p STARTED 20500297 9.1gb 127.0.0.1 graylog

server_index_50 5 p STARTED 20498283 9.1gb 127.0.0.1 graylog

server_index_50 1 p STARTED 20491343 9.1gb 127.0.0.1 graylog

server_index_50 0 p STARTED 20497389 9.1gb 127.0.0.1 graylog

gl-failures_39 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_39 0 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_59 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_59 0 p STARTED 0 208b 127.0.0.1 graylog

gl-system-events_2 3 p STARTED 0 208b 127.0.0.1 graylog

gl-system-events_2 2 p STARTED 0 208b 127.0.0.1 graylog

gl-system-events_2 1 p STARTED 0 208b 127.0.0.1 graylog

gl-system-events_2 0 p STARTED 0 208b 127.0.0.1 graylog

server_index_37 5 p STARTED 20008139 8.6gb 127.0.0.1 graylog

server_index_37 4 p STARTED 20013162 8.6gb 127.0.0.1 graylog

server_index_37 3 p STARTED 20009352 8.6gb 127.0.0.1 graylog

server_index_37 1 p STARTED 20008619 8.6gb 127.0.0.1 graylog

server_index_37 2 p STARTED 20015685 8.6gb 127.0.0.1 graylog

server_index_37 0 p STARTED 20005487 8.6gb 127.0.0.1 graylog

server_index_38 3 p STARTED 20012813 8.6gb 127.0.0.1 graylog

server_index_38 4 p STARTED 20010170 8.6gb 127.0.0.1 graylog

server_index_38 5 p STARTED 20004752 8.6gb 127.0.0.1 graylog

server_index_38 1 p STARTED 20006191 8.6gb 127.0.0.1 graylog

server_index_38 2 p STARTED 20009865 8.6gb 127.0.0.1 graylog

server_index_38 0 p STARTED 20005834 8.6gb 127.0.0.1 graylog

gl-failures_43 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_43 0 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_63 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_63 0 p STARTED 0 208b 127.0.0.1 graylog

application_index_10 1 p STARTED 14988 3.9mb 127.0.0.1 graylog

application_index_10 0 p STARTED 14818 3.8mb 127.0.0.1 graylog

server_index_34 1 p STARTED 18610446 7.1gb 127.0.0.1 graylog

server_index_34 4 p STARTED 18612725 7.1gb 127.0.0.1 graylog

server_index_34 5 p STARTED 18604441 7.1gb 127.0.0.1 graylog

server_index_34 3 p STARTED 18610833 7.1gb 127.0.0.1 graylog

server_index_34 2 p STARTED 18609696 7.1gb 127.0.0.1 graylog

server_index_34 0 p STARTED 18616734 7.1gb 127.0.0.1 graylog

gl-failures_55 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_55 0 p STARTED 0 208b 127.0.0.1 graylog

routers_index_2 1 p STARTED 36868 5.2mb 127.0.0.1 graylog

routers_index_2 0 p STARTED 36458 5.2mb 127.0.0.1 graylog

server_index_60 1 p STARTED 13087253 6.1gb 127.0.0.1 graylog

server_index_60 4 p STARTED 13081666 6.1gb 127.0.0.1 graylog

server_index_60 5 p STARTED 13081636 6.1gb 127.0.0.1 graylog

server_index_60 2 p STARTED 13084671 6.1gb 127.0.0.1 graylog

server_index_60 3 p STARTED 13085467 6.1gb 127.0.0.1 graylog

server_index_60 0 p STARTED 13090601 6.1gb 127.0.0.1 graylog

desktops_index_61 4 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_61 3 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_61 1 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_61 2 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_61 0 p STARTED 0 208b 127.0.0.1 graylog

general_syslog_events_index_0 3 p STARTED 0 208b 127.0.0.1 graylog

general_syslog_events_index_0 2 p STARTED 0 208b 127.0.0.1 graylog

general_syslog_events_index_0 1 p STARTED 0 208b 127.0.0.1 graylog

general_syslog_events_index_0 0 p STARTED 1 8.2kb 127.0.0.1 graylog

gl-failures_49 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_49 0 p STARTED 0 208b 127.0.0.1 graylog

desktops_index_58 4 p STARTED 2586767 1.6gb 127.0.0.1 graylog

desktops_index_58 3 p STARTED 2590485 1.6gb 127.0.0.1 graylog

desktops_index_58 2 p STARTED 2590136 1.6gb 127.0.0.1 graylog

desktops_index_58 1 p STARTED 2589038 1.6gb 127.0.0.1 graylog

desktops_index_58 0 p STARTED 2589123 1.6gb 127.0.0.1 graylog

gl-failures_38 1 p STARTED 0 208b 127.0.0.1 graylog

gl-failures_38 0 p STARTED 0 208b 127.0.0.1 graylog

curl -XGET http://127.0.0.1:9200/_cluster/allocation/explain?pretty

{

"error" : {

"root_cause" : [

{

"type" : "illegal_argument_exception",

"reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"

}

],

"type" : "illegal_argument_exception",

"reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"

},

"status" : 400

}

Thanks,

Chase

We have a currently have a system with 12 cores 24 if you include hyperthreading. Node has 32GB of ram.

Main 2 processes that are eating resources in htop:

1122 elasticse 20 0 1.0T 14.4G 2042M S 1298 45.9 **191h** /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

1330 graylog 20 0 19.9G 7159M 21656 S 971. 22.3 **297h** /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true –ja

All system resources in htop:

1330 graylog 20 0 19.9G 7159M 21848 S 710. 22.3 297h /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1122 elasticse 20 0 1.0T 14.4G 2075M S 664. 46.0 191h /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2426 elasticse 20 0 1.0T 14.4G 2075M R 92.9 46.0 2h05:07 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

1948 graylog 20 0 19.9G 7159M 21848 R 91.5 22.3 31h19:47 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1950 graylog 20 0 19.9G 7159M 21848 R 90.1 22.3 31h19:36 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1953 graylog 20 0 19.9G 7159M 21848 R 90.1 22.3 31h19:20 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1952 graylog 20 0 19.9G 7159M 21848 R 90.1 22.3 31h19:45 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1951 graylog 20 0 19.9G 7159M 21848 R 90.1 22.3 31h20:00 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1949 graylog 20 0 19.9G 7159M 21848 R 88.6 22.3 31h19:33 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1686 elasticse 20 0 1.0T 14.4G 2075M S 51.5 46.0 4h29:51 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2319 elasticse 20 0 1.0T 14.4G 2075M R 38.6 46.0 3h51:00 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2302 elasticse 20 0 1.0T 14.4G 2075M R 37.2 46.0 3h49:36 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2320 elasticse 20 0 1.0T 14.4G 2075M R 37.2 46.0 3h50:41 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2299 elasticse 20 0 1.0T 14.4G 2075M R 35.7 46.0 3h50:19 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

1968 graylog 20 0 19.9G 7159M 21848 S 31.4 22.3 17h05:12 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

2314 elasticse 20 0 1.0T 14.4G 2075M S 30.0 46.0 3h48:40 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2347 elasticse 20 0 1.0T 14.4G 2075M S 28.6 46.0 2h12:33 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2316 elasticse 20 0 1.0T 14.4G 2075M S 28.6 46.0 3h48:04 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

25152 graylog-a 20 0 9920 6016 3300 R 27.2 0.0 0:14.22 htop

2317 elasticse 20 0 1.0T 14.4G 2075M S 25.7 46.0 3h58:52 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2307 elasticse 20 0 1.0T 14.4G 2075M S 24.3 46.0 3h52:12 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2310 elasticse 20 0 1.0T 14.4G 2075M S 24.3 46.0 3h51:00 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2306 elasticse 20 0 1.0T 14.4G 2075M S 22.9 46.0 3h55:57 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2300 elasticse 20 0 1.0T 14.4G 2075M S 22.9 46.0 3h51:25 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2301 elasticse 20 0 1.0T 14.4G 2075M S 22.9 46.0 3h51:09 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2308 elasticse 20 0 1.0T 14.4G 2075M S 22.9 46.0 3h50:15 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2322 elasticse 20 0 1.0T 14.4G 2075M S 20.0 46.0 3h50:08 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

1337 graylog 20 0 19.9G 7159M 21848 S 18.6 22.3 1h29:18 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

2282 graylog 20 0 19.9G 7159M 21848 R 17.2 22.3 1h39:04 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

2305 elasticse 20 0 1.0T 14.4G 2075M S 15.7 46.0 3h54:16 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

1767 graylog 20 0 19.9G 7159M 21848 S 14.3 22.3 9h44:44 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

2304 elasticse 20 0 1.0T 14.4G 2075M S 14.3 46.0 3h55:46 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2323 elasticse 20 0 1.0T 14.4G 2075M S 14.3 46.0 3h50:01 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2315 elasticse 20 0 1.0T 14.4G 2075M S 12.9 46.0 3h51:47 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2309 elasticse 20 0 1.0T 14.4G 2075M S 12.9 46.0 3h48:41 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2313 elasticse 20 0 1.0T 14.4G 2075M S 10.0 46.0 3h49:33 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

1128 mongodb 20 0 1288M 250M 11568 S 10.0 0.8 2h56:58 /usr/bin/mongod --config /etc/mongod.conf

1934 graylog 20 0 19.9G 7159M 21848 S 10.0 22.3 2h25:53 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

2285 graylog 20 0 19.9G 7159M 21848 R 8.6 22.3 1h39:41 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

2312 elasticse 20 0 1.0T 14.4G 2075M S 7.1 46.0 3h47:31 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

1935 graylog 20 0 19.9G 7159M 21848 S 7.1 22.3 2h25:57 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1936 graylog 20 0 19.9G 7159M 21848 S 7.1 22.3 2h25:56 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1766 graylog 20 0 19.9G 7159M 21848 S 5.7 22.3 4h00:43 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

2348 elasticse 20 0 1.0T 14.4G 2075M S 5.7 46.0 49:47.99 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

1765 graylog 20 0 19.9G 7159M 21848 S 5.7 22.3 4h01:00 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

2318 elasticse 20 0 1.0T 14.4G 2075M S 4.3 46.0 3h50:55 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

1731 elasticse 20 0 1.0T 14.4G 2075M S 4.3 46.0 58:25.11 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2292 graylog 20 0 19.9G 7159M 21848 S 4.3 22.3 1h39:48 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

2150 graylog 20 0 19.9G 7159M 21848 S 4.3 22.3 2h12:18 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1884 elasticse 20 0 1.0T 14.4G 2075M S 4.3 46.0 14:26.01 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2164 graylog 20 0 19.9G 7159M 21848 S 4.3 22.3 1h22:27 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1471 mongodb 20 0 1288M 250M 11568 S 4.3 0.8 43:44.47 /usr/bin/mongod --config /etc/mongod.conf

2221 graylog 20 0 19.9G 7159M 21848 S 4.3 22.3 26:26.48 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

2032 mongodb 20 0 1288M 250M 11568 S 4.3 0.8 10:08.87 /usr/bin/mongod --config /etc/mongod.conf

2284 graylog 20 0 19.9G 7159M 21848 S 2.9 22.3 1h38:00 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

2140 graylog 20 0 19.9G 7159M 21848 S 2.9 22.3 1h23:06 /usr/bin/java -Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -ja

1893 elasticse 20 0 1.0T 14.4G 2075M S 2.9 46.0 57:41.40 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

2451 elasticse 20 0 1.0T 14.4G 2075M S 2.9 46.0 2h06:11 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless

Showing your GL & ES configuration files would help, Starting to seam like a resource issue or how you distributed resources, not 100% sure thou.

EDIT: I hope you don’t mind but I had to edit your post so it was easier to read. If you could in the future use the markdown, thanks

EDIT2: I forgot to ask, is there extractors on you INPUTS? if so, what do you have configured? By chance you have any pipelines configured?

Attached are the graylog.conf and elastisearch.yml files

I’d like to point out that resources didn’t seem to be a problem for the first month that it was working. Traffic to the servers has gone down over time, as a whole. I suppose there could be a burst of traffic, but I haven’t seen anything that would indicate that’s the problem.

Thanks,

Chase

(Attachment graylog.conf - backup.txt is missing)

(Attachment elastisearch.yml -backup.txt is missing)

======================== Elasticsearch Configuration =========================

How best to upload the configs?

Hello,

Also for uploads perhaps

If that doesn’t work, then you can look here on HowTo.

Hello @Chase

Then it would be how the resources were distributed or possible what wasn’t configure, don’t know for sure since there are no configuration shown.

Do you want the entire config, or just parts?

image001.jpg

image002.jpg

Hello @Chase

Sometimes, one has to think,
“What could I show that would make this other person 5000 miles away, to be able to help with my issue” :thinking:

Give ya some tips,
If you want to show Graylog’s configuration file, I would execute this command and Copy & Paste it here using the MarkDown “</>” show above in the text box.

cat /etc/graylog/server/server.conf | egrep -v "^\s*(#|$)"

If you want to show Elasticsearch configuration file, I would execute this command and Copy & Paste it here using the MarkDown “</>” show above in the text box.

cat /etc/elasticsearch/elasticsearch.yml | egrep -v "^\s*(#|$)"

What this does is it removes of all the extra lines that are not needed. Screen shots are pretty easy.
I use ALT - PRINT SCREEN that way I just get what I want , and not my whole monitor screen. Then edit with other software, Just an FYI Window’s PAINT is good.

Elasticsearch and Graylog Log files would help, Not sure how much but when needing help I would show as much about my environment as possible , So whom ever decided to help me would be able to understand my setup. Just a thought.

Remember, before posting remove private information. On a personal note, take you time and make it look good. This shows other community members that you care and really do need help.

Here is the requested info. The snarkiness is not appreciated.

cat /etc/graylog/server/server.conf | egrep -v “^\s*(#|$)”

is_master = true

node_id_file = /etc/graylog/server/node-id

password_secret = ******************************

root_username = *********************

root_password_sha2 = ***********************************

root_timezone = America/New_York

bin_dir = /usr/share/graylog-server/bin

data_dir = /var/lib/graylog-server

plugin_dir = /usr/share/graylog-server/plugin

http_bind_address = ...:9000

rotation_strategy = count

elasticsearch_max_docs_per_index = 20000000

elasticsearch_max_number_of_indices = 20

retention_strategy = delete

elasticsearch_shards = 4

elasticsearch_replicas = 0

elasticsearch_index_prefix = graylog

allow_leading_wildcard_searches = true

allow_highlighting = false

elasticsearch_analyzer = standard

output_batch_size = 500

output_flush_interval = 1

output_fault_count_threshold = 5

output_fault_penalty_seconds = 30

processbuffer_processors = 6

outputbuffer_processors = 4

processor_wait_strategy = blocking

ring_size = 65536

inputbuffer_ring_size = 65536

inputbuffer_processors = 2

inputbuffer_wait_strategy = blocking

message_journal_enabled = true

message_journal_dir = /var/lib/graylog-server/journal

lb_recognition_period_seconds = 3

mongodb_uri = mongodb://localhost/graylog

mongodb_max_connections = 1000

mongodb_threads_allowed_to_block_multiplier = 5

proxied_requests_thread_pool_size = 32

sudo cat /etc/elasticsearch/elasticsearch.yml | egrep -v “^\s*(#|$)”

path.data: /var/lib/elasticsearch

path.logs: /var/log/elasticsearch

cluster.name: graylog

action.auto_create_index: false

Thanks,

Chase

Since the processbuffer_processors is you r heavy hitter, by chance have you tried to increase just the processbuffer_processors? I seams you have enough CPU to increase that a to 8 and you may need to wait.
I do about the same amount os messages per second give or take 100

processbuffer_processors = 8
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking

As for the Elasticsearch, Do you see any errors/warnings in your logs?
Since discovery.type: single-node is not configured , I assume this is the only one in the environment?
I believe the low Output you are seeing is because ES cant process the messages in the journal fast enough.
As stated above.

That will do it. I would suggest increase the processor gradually and wait.

Yes, we have tried increasing process buffer, all the even to 10 cores, with no benefit. Again, the weird thing is that once we reboot, all messages that have built up are then flushed out. So I can understand if say there was a burst of data that filled the buffer, but even if that was the case, given some time, should continue to process data, and eventually go back to normal. However, what’s weird is instead when we hit that mark, is that it will never go back to its normal performance without a reboot. What’s even weirder to me, is that I can shut down all 3 services (elasticsearch, graylog, and mongodb), and then start them again, and performance is still degraded. Only when the machine is rebooted does it seem to fix the issue.

As far as elasticsearch goes, we do have error messages in the logs, however, most of these are related to our desktop environment, and is saying that there is over a 1000 fields.

ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]

We’ve disabled that input before, to see if that was causing the issue, but ended up having same problem. Other than that we occasionally get errors like:

ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [Message1] of type [date] in document with id ‘6e76a783-a915-11ec-ae22-ce92f7854bb4’. Preview of field’s value: ‘Power.EnergyEstimationEngine.Wifi.ppkg’]]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=failed to parse date field [Power.EnergyEstimationEngine.Wifi.ppkg] with format [strict_date_optional_time||epoch_millis]]]; nested: ElasticsearchException[Elasticsearch exception [type=date_time_parse_exception, reason=Failed to parse with all enclosed parsers]];

Again, this mainly comes from logging our desktops, and as mentioned didn’t resolve the issue.

Thanks,

Chase

Hello,

I assume your taking about your Journal? If so not sure why your server would do that. The journal was created for this purpose if Graylog crashed. By rebooting you mimic a crash.

I assume there is a wait time that is given before the next reboot? Basically not just rebooting the server randomly and giving time for Graylog to catch up?

Is ulimit properly configured?

Seams that this problem has been around before.

I looked over your other posts and they all look similar, Either to many fields, Output buffer fills up, Processor buffer files up or Journal is filling up.

These are all pointing to your elasticsearch/configurations but looking at your ES & GL configuration files comparing it to mine, I really don’t see anything sticking out that I can Identify the source of the issue.

Seams Elasticsearch is having hard time with the message coming in and perhaps Indexing them. I would really look into this. It maybe the source or part of the issue.

Some Ideas:

  • Perhaps look into GL garbage collection this could have an impacted on performance.
  • Not sure how your ingesting log/s or shipping them. Perhaps, adjust your Log shippers to send the minimal amount of logs.
  • Insure the correct input is used for each device sending log to Graylog.
  • Ensure the Elasticsearch version is no greater the 7.10

If none of that works, I would start small for example:

  • Input Syslog UDP for my Windows devices.
  • Input Raw/Plaintext for my network work devices.

Double check other logs on this system /var/log maybe you can get a clue or some type of direction on why this issue is happening.

I assume your taking about your Journal? If so not sure why your server would do that. The journal was created for this purpose if Graylog crashed. By rebooting you mimic a crash or reboot.

You are correct once the buffer fills up, then the journal does as well. Once the system is rebooted, we can see a huge increase in performance—i.e up to 10,000 – 15,000 logs a second, as the system catches back up to its normal state.

I assume there is a wait time that is given before the next reboot? Basically not just rebooting the server randomly and giving time for Graylog to catch up?

I’m not exactly sure what you are meaning by wait time. We have shutdown services, and rebooted. We have shutdown services and let the journal catch up, and then rebooted. Regardless, anything we do seems to end up in us rebooting the system for it to start working properly again.

Is ulimit properly configured?

We have not touched the ulimit, and was not aware that we needed to do anything with it. With regards to ulimit is this something that is set for a docker deploy?

Some Ideas:

· Perhaps look into GL garbage collection this could have an impacted on performance.

Where do I look for garbage collection? Is it just /var/logs/graylog?

· Not sure how your ingesting log/s or shipping them. Perhaps, adjust your Log shippers to send the minimal amount of logs.

All of our logs come in 1 of 2 ways. Either they are a syslog, setup, or they are being imported via GELF. As of two days ago, I decided as a test to disable all of our Desktop and Server logs. This took the our average down to about 500msg/s. Even with that few of logs the system still crashed.

· Insure the correct input is used for each device sending log to Graylog.

All inputs look like they are good.

· Ensure the Elasticsearch version is no greater the 7.10

ElasticSearch version is 7.10.2. Can I downgrade this, and if so would there still be a problem with log4j? I don’t recall any upgrade in those 30 days in which it was working.

Double check other logs on this system */var/log* maybe you can get a clue or some type of direction on why this issue is happening.

Have already checked, will check again. Is there anything specific that you think I should be looking for?

Thanks,

Chase

Hello,
Sorry for the delay.

To be honest I wouldn’t touch your version of ES and Downgrading a prod env probably would be bad idea.

I going of the title of this post

“Journal utilization is too high - process buffer 100%”

Reason for journal filling up is that Graylog cannot process the message in the journal fast enough. This could require resources added or reconfigured. The reason I’m stating this is because you stated that even with a few logs it does the same thing.
When Buffers fill up either raise the resource configuration or add more resources in the configuration file along with checking permissions. These are the main reason for that to happen with the little amount of logs you have ingested. If this was like the other post I’ve been help with they are ingesting 3000-5000 MPS about 300-500GB a day they resolve this by increasing Field type refresh interval from 5 seconds to 30 seconds. Smaller the number the more resources.

This would mean since you have a dynamic index template, Elasticsearch is create a ton of fields.

Or you could adjust these settings.

message_journal_max_age
message_journal_max_size

In the graylog-server.conf file.

I’m not 100% sure but it took 30 days for this issue to be noticed, You mention you did create new indices ( depending how this was configured) and added Allowed wildcard searches in graylog system file(these will use resource), I believe in the documents it does state this.
You may want to look at this also.

The link above is for the following perhaps check these.

Elasticsearch heap, should look something like this.

vi /etc/elasticsearch/jvm.options

# JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms4g
-Xmx4g

Graylog heap, should look something like this.

# Path to the java executable.
JAVA=/usr/bin/java

# Default Java options for heap and garbage collection.
GRAYLOG_SERVER_JAVA_OPTS="-Xms3g -Xmx3g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:-OmitStackTraceInFastThrow "

I went over all the basic stuff needed to reduce Buffers filling up. I think you may have missed something but I’m not sure what yet. You seam to have basic setting you ingest rate is very small foot print on this server. This might just be a couple different combination of configuration needed.

Sum it up

Perhaps set buffers

inputbuffer_processors = 2
processbuffer_processors = 6 <--- Heavy hitter (NOTE:  With your setup  you could go as far as 10)
outputbuffer_processors = 3 <-- Don't raise this setting unless you see Output buff filling up and not going down.

Graylog Heap ( NOTE: I see in you post above you have this set @ 5GB perhaps you can lower that to 4?

GRAYLOG_SERVER_JAVA_OPTS="-Xms4g -Xmx4g

Elasticsearch Heap

-Xms4g
-Xmx4g

Adjust you indices Field type refresh interval = 30 seconds

If you can disable Wildcard search’s for the time being & restart services then wait, I hope you start Elasticsearch first and once its is in GREEN then start Graylog, Just an Idea.

If you have any extractors on your inputs I would look at that also, It could be an issue.

EDIT: I forgot to mention process buffer dump

The Nodes details page might give you some additional insides about the buffers and the current state.

If this was like the other post I’ve been help with they are ingesting 3000-5000 MPS about 300-500GB a day they resolve this by increasing Field type refresh interval from 5 seconds to 30 seconds. Smaller the number the more resources.

If this value is set, then will that affect us if we wish to do live searches, where were are watching traffic come in live? Do you set this per indice, or do you set this value on the a template, or is this value set somewhere completely different?

Or you could adjust these settings.

Hello,

Just what it states, How often the field type information for the active write index will be updated.

Web UI, Each indices should have this configuration.