i do this but without change …
output_batch_size = 100
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3
In fact, i realize that some message was processed but the most part are deleted because journal’s rotating and because processing are very very slowly.
How I can do to speed up the calculation ?
You are sending 3*100 messages per second to Graylog. Just for the reference the text from the above mentioned Blog Posting:
Next, in your Graylog server.conf, you can increase output_batch_size and adjust outputbuffer_processors to allow larger batches to be sent over fewer processors. The settings for output batch size and output buffer processors will vary depending on your environment. For our environment, we set output_batch_size to 5000 and outputbuffer_processors to 3 with a 31 GB heap memory for the Elasticsearch node.
So why not set your output_batch_size to 2000 and let the journal drain out?
I had done the test with several combinations
with output_batch_size = 100 and output_batch_size = 600, to see the impacts, but nothing change
i just try with output_batch_size = 2000 and no change, the number already increase
further investigation is very time consuming and would need to dig into your log files. We, from remote would only be able to look into that if you provide log files from elasticsearch an graylog.
additional you would need to give graylog some time to process your journal this can take up to a few hours.
Everything that follows now is just oracle and guessing.
You should look into your graylog logfile and look and read what might be the issue.
I thank you already for the help you have given me.
I will not be able to start from a paid support … but will the solution to erase the elasticsearch indexes will restore the service ?
I use an LVM logical partition to store my indexes, partition that I have increased several times, could the problem come from there ?
I use your example of output batch size 5000 and number of processors 3, and the graylog nodes are some times limited to 15k messages per second.
Is there somewhere a guide on how big batch sizes are recommended, versus the elasticsearch cluster settings? I.e. can I just double the 5000 to 10000 and be safe, or is there some other parameter that I should consider?