Explanation of Graylog data processing and performance


(Severin Simko) #1

Hey,

I’m preparing an academical paper about graylog and I wanted to ask about how exactly is graylog processing data (buffers) and how different conf. parameters influence the performance ( throughput) of the graylog.

I have 1 server ( 4 CPU, 8GB RAM) on which all - graylog, mongo, and ES are running. I was recently testing the throughput of this node with different conf. parameters, unfortunately, it looks like nothing really influences the throughput that much. I’m able to process only around 1000 EPS. If I test 1500+ output buffer and journal are both quickly full.

I was reading a lot of similar threads but I was not able to find a clear and detailed explanation of how exactly Graylog processes data.

Is this how Graylog process data/logs from one buffer to the other one?

UPDATE: This picture is edited according to the comments below

Can you please tell me if this is the architecture of buffers that the Graylog uses when processing data?

My configuration is as following:

ring_size= 65536 ( # of messages in each buffer)
inputbuffer_ring_size=65536

inputbuffer_processors = 4
processbuffer_processors = 5
outputbuffer_processors = 3

output_batch_size = 2000

SERVER RAM = 2gb
GRAYLOG RAM = 2gb
ELASTICSEARCH RAM =4gb

I was trying different ring sizes, different batch sizes and different number of processors but to no avail.

For me the best combination seemed to be:
output_batch_size = 10000
inputbuffer_processors = 4
processbuffer_processors = 5
outputbuffer_processors = 2
inputbuffer_ring_size=65536

With “the best” I don’t mean that the throughput was higher but in case of more logs 1500+/sec it seemed that the buffers are filling up slower.

Is there any mathematical formula that calculates the max. throughput ?

I don’t have more computational resources available and I wanted to optimize my server to get the highest throughput, can you please give me advice about what to do better?


#2

You have 4 processors but quite a lot of threads, so it is expected that you will not have a very high throughput.

A very influential factor that you did not test is the way you extract data (in extractors or processing pipelines). The way you write your regexes can easily have 20 times slowdown/speedup in your server, or even more.

What you could do is to make a test set of log lines and then run the same test set against your server, varying one parameter at a time, noting, that you have actually a n-dimensional hybercube, where the parameters depend of each other (the number of processors limits the time your different threads can have CPU time).


(Severin Simko) #3

Thank you so much for the answer!

I disabled all extractors and pipelines in order to have the highest throughput.

I use the “loggen” application to create and send testing logs to my application: Loggen: https://www.balabit.com/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/loggen.1.html

So when I’m running this for a few minutes then I guess it still produces the same testing output.

I’ll try different number of processors - thank you so much


(Jan Doberstein) #4

@severinsimko

What does that mean in your posting?

SERVER RAM = 2gb
GRAYLOG RAM = 2gb
ELASTICSEARCH RAM =4gb

Elasticsearch writes (even in the config file) that you should give max. 50% of the available RAM to Elasticsearch, but as you have Graylog and Elasticsearch fighting for ressources you should lower that.

Graylog HEAP 1GB
Elasticsearch HEAP 2GB

Would be my configuration on that. If you measure the throughput, did you collect all metrics? Only with that metrics you will see what changes in detail.

Just watching the in/out counter does not give you any idea what changed when you change some numbers.


(Severin Simko) #5

Hi, @jan thank you for your answer.

In total, I got 8 GB RAM on my server and 4 CPUs.

I set 4 GB (50%) for Elasticsearch, 2 GB for GRAYLOG and the rest 2GB for OS.

I’ll try to lower that as you suggested.

I was actually checking these metrics:

  • org.graylog2.buffers.OutputBuffer.incomingMessages for total messages and 1 min avg.

  • org.graylog2.buffers.output.usage to watch how the buffer is filling

What do you think would be useful to check?

@jan could you please have a look at the diagram that I posted? Does it work like that?


(Jan Doberstein) #6

The diagram need to have the input buffer and journal changed in position. From what I can see that seems roughly accurate.

You should check the output metrics of every buffer and the journal too as this will show you what impact your configuration changes have to the processing.


(Severin Simko) #7

Thanks!

I actually got the opportunity to test it on a more powerful server: 32Gb RAM and 12 CPU so I’ll try to check multiple metrics in order to see the impact.

I just have one question, what is the “best” or “most accurate” indicator of the throughput per sec.

I know that there is a org.graylog2.throughput.output but it shows the current throughput.

I would like to get the info about #_events / sec. I was thinking about org.graylog2.buffers.processors.OutputBufferProcessor.executor-service.completed

Is the “Mean” parameter measuring the time from which graylog instance is running?

But I don’t think that there is the parameter that I need.

My testing scenario:

1, Send 1 million logs to graylog from the loggen application
2, Measure the time in which graylog processed all of them.

Because if I have the time then I can calculate the throughput per sec.

Thank you


(system) #8

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.