Enhance Graylog search performance by adding new Elastic nodes?

Hi community

We see some search performance issues and I would like to be sure to go into the right direction prior to invest into new hardware and therefore would like to ask the community for their experience.

We’re running an infrastructure with 3 graylog and 3 elasticsearch nodes by which one elastic node is only an eligible master node without data. So there are only 2 data nodes in the elastic cluster. Unfortunately those data nodes or at least one of them aren’t equiped with very fast disks.

Input is around 4k messages per second. Constant maximum output is around 10k message per second. We have one index per day for audit and other messages each configured with one primary and one secondary shard. At the moment there are 225 indices, 450 shards and 17’977’430’475 documents in the elasticsearch cluster which consumes a disk space of 15.44TB data.

A search in in the last 5 minutes takes about 20 seconds to run over all messages. Aprox 3 seconds less in streams. A search in the last day (24 hour) takes about 60 seconds and during the search, after 20 seconds or so, the output stops sending messages to elasticsearch. After the search has finished messages are sent again and the journal is emptied.

During the search (no matter how much back in time) the elasticsearch node holding the primary shards is running at >90% cpu power. The average load of the system is around 8 and peaks at 12 during a search.

From my understanding of the elasticsearch infrastructure the load per cluster node decreases during searches by increasing the number of nodes. Is my asumption correct? Has somebody some experience with performance issues? The goal is to have search times for short range searchs under 3 seconds and for long range searchs under 10 seconds.

Thank you very much in advance for any suggestions on this.

Best regards, Stefan


you probably have too little memory in the ES nodes. For 15T of data per node you probably could use a full 64G RAM and 32G JVM per node, but for performance it could be more efficient to have 4 servers with 32G RAM and 16T JVM each.

This is just a gut feeling, but just something you could look at.


Thank you for your answer and feeling about this. Forgot to mention that both cluster nodes already have 64GB of RAM and 31GB JVM. I too share your gut feeling :slight_smile:

best regards

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.