What causes a graylog instance to start caching up out of the blue?

testing out new hardware on 2 servers - both have 48 threads, 128GB ram
the graylog server has 4x SSDs in a raid10, the elastic server has 6x spindle drives in a raid 6. both are connected to each other with a 10GBe link.
graylog has 30GB jvm and elastic has 31GB jvm heap. Did all the tweaks for elastic such as setting the indexing to 60s. indexes are set at 28GB, 5 shards
in graylog, i’ve adjusted the output buffers to 30 and the batchsize to 8000

running graylog 2.4 and elastic 2.4.4. no streams, pipelines or anything else setup in graylog

I’m able to get bursts of about 30k eps and sustained 10k eps without it touching the journal too much. I ran it all weekend with sysloggen pushing 10k at it and things were fine. Pushed about 3TB into it this weekend. This morning I increased it to 15k eps and it handled it for several minutes and then the output to elastic was sitting at 0 and the journal was caching up. I turned off sysloggen for the journal to clear and then it could barely handle 10k eps without hitting the output buffer a considerable amount, with the output to elastic frequently sitting at 0. Graylog and elastic logs didn’t show any issue during this time

restarted elastic, didn’t make much of a difference. restarted graylog and it’s holding steady for about 20 minutes and then the output and process buffers start loading up

did you write some metrics while doing all this?

this would give you some light!

i’ll see what i can do to reproduce the scenario. Do you have a guide for metrics or guidelines other than disk IO, memory, CPU and network stats?

You should also record Graylog-internal metrics.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.