TLDR: We have some settings that work quite well, but we do not understand why.
We have been using graylog for quite some time now in different environments and we really like it. In general we are trying to keep the setup simple and have all components on a single node. Our physical nodes have 20 cores(40HT), 256 GB RAM and spinning HDDs, which gives us a ok’ish performance. Although we do not need that much RAM and CPU this is our standard server hardware for larger workloads. We have log message peaks around 60k-120k messages per second, which can be handled by using the diskjomrnal. Loads around 40k messages can be processed without using the journal. Although we thought that the performance is some kind of weak, and we tried to follow the advices on how to tune graylog, like  and elasticsearch advices like , we decided that this performance will do. We have graylog versions 2.2 and 2.3, both perform similar.
output_batch_size = 4000 output_flush_interval = 1 processbuffer_processors = 16 outputbuffer_processors = 16 processor_wait_strategy = blocking
# optimize for spinning disks, rather than SSD index.merge.scheduler.max_thread_count: 1 index.translog.flush_threshold_size: 1gb index.refresh_interval: 60s # we don't really care about search performance. So let the indexer work as # fast as possible without throttling indices.store.throttle.type: none index.merge.scheduler.max_merge_count: 16
Recently we deployed some instances to a cloud provider with 16vCPUs, 56 GB RAM and SSDs. Obviously that these machines could not match the performance we observed in our bare metal deployments. But we could not manage to get more than a message throughput of 5k messages per second. Of course the setup was altered to take less ram and less CPUs into account, but we needed to increase the throughput and started to experiment with the settings. Increasing the output batch size did negatively impact the performance so we reduced the batch size to 100, which is 1/10th of the default value. At the same time we increased the processbuffer_processors and outputbuffer_processors quite a lot. So the current settings, that seem to be a sweet spot are really the opposite to the best practices.
output_batch_size = 100 processbuffer_processors = 72 outputbuffer_processors = 128 elasticsearch_max_total_connections = 256 elasticsearch_max_total_connections_per_route = 256
# optimize for spinning disks, rather than SSD # index.merge.scheduler.max_thread_count: 1 # index.translog.flush_threshold_size: 1gb index.refresh_interval: 30s # we don't really care about search performance. So let the indexer work as # fast as possible without throttling # indices.store.throttle.type: none # index.merge.scheduler.max_merge_count: 16
This gives us a throughput of 50-55k messages per second. Of course all those additional threads of the processors have an impact on the cpu usage, but since we can increase the vm size easily, I think we could hit 100k messages, if needed.
Out of curiosity we deployed the same settings to our bare metal machines and they seem to work fantastic, too. Even with spinners instead of SSDs the throughput is between 90k-100k messages per second.
So my question is, why are these settings performing so much better and is it possible to increase the throughput even further before splitting elastic search to separate machines or having a graylog cluster? Is anybody seeing similar effects?