Output batch_size question


I currently run 5 Graylog nodes on cluster on aws c5 ec2 instances. 16 CPU and 32 GB Ram.
On these machines I also run elastic coordinate only nodes.
Heap size for Graylog is 12G
Heap size for ES coordinate node 8G
http.max_content_length of elastic set to 1024Mb
Index.refresh: 15s
Graylog nodes configured to send messages to all 5 coordinate only nodes.

Coordinate only nodes are part of elastic cluster that consist of 16 data nodes and 3 separate masters.
Data nodes have 3.5 Tb NVME SSD, 16 cores and 122 GB RAM.

I set output batch size to 1000
Refresh rate: 1s
Max elastic connections to 160 and
Max connections per route 32.

Our median log flow is 15.000msg/sec

Does it make sense to raise batch size to 10.000 or it may have negative effect on performance due to very large bulk size ?

Large bulk size also may cause ES to reject it, I found on our cluster that having a batch size of about 2048 with more outputbuffer_processors can raise performance - up to a certain extent. We run 3 graylog nodes, with 8 outputbuffer_processors, 16 processbuffer_processors (on 24 core machines).

Realistically, since everyone’s setup is different the only advice I can give you is: experiment. Give it a shot and see what happens :slight_smile: There’s currently no real “golden bullet” for larger setups.

You right, I am just wondering about rule “set batch size to your median log rate”, but it looks like it is not going to work for extremely heavy loaded setups

I think also raising the number of connections per route will work, we use 64 per route with Graylog pointed at 3 coordinating nodes, and a maximum of 3 * 64 connections (because, well, math and random reasons).

Actually I am not complaining on performance: since I set up http.max content length and index refresh properly everything works fantastic with 1000 output batch size. But since our log traffic growth I want to be ready to higher throuput

Hmm, I’d say see if you can tune the existing setup so you know where your “breaking point” is, and then start planning on more Graylog nodes :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.