Help to optimize processing

I’m having performance issues on our Graylog installation. It can’t keep up with incomming messages, and I need some tips on improving.

The version is 2.4.6, on docker on Ubuntu. I have a standard 1 node setup with Graylog, ES and Mongodb on seperate docker containers.

The processing shows 100 utilization, and memony consumption jumps up 3 times from up from around 1gb to 1.6 gb of 1.8 in steps each second and then return to 1gb.

Process buffer is full/100%

I have a processing average on 500-1000 msg per second.

The docker PS shows and CPU utilization on 90-140%.

I guess I’ll might have to bring in more CPU power, or maybe tweak the number of processor buffers?

I only have a single rule and a pipeline setup. All really simple and it seems to be very restricted in what it is doing.

AWS lookup and GeoIP resoler are disabled.

Any hints to what I can do for getting better performance?

Best regards, Peter Meldgaard

You may want to consider moving your Elasticsearch to another machine so it can handle the influx from GL. I used elasticdump (https://www.npmjs.com/package/elasticdump) to do this since I don’t have resources to start up a elastic cluster. There are also a lot of other posts on optimizing in the GL community if you search for them… here is one, though they have clusters rather than docker instances. Status Green, all systems go. How to optimize?

500-1k message not a lot…
before you do anything (without sence…) analize the problem.
after you know where is the problem solve it.
that’s simple, isn’t it? :smiley:

check the top on your machine. ES or GL cause the load?
check the server.conf, count the processors number. it sould less than the 3/4 of your cpu cores. (count with other processes also on your machine)(but the best, leave the original values…, its work over 15k messages/host also)
check the java heap size. eg. lookup tables can eat a lot memory.
if you don’t see any error… and you see GL eat all of your CPUs, check your extensions, and pipelines. GL contains a lot of metrics, you can check where is the problem. I suggest start with your regexps. If you don’t have time to digging, do logarithm search, temp disable your pipelines.

Do this tings, and share all the information what you collect.

Thank you macko003, for suggested bullets.

I’ll certanly analyze., but first I need to upgrade to 3.0. So this is causing a little delay on this.

I’ll be back with more finding as soon as we’re on 3.0 and have analyzed the performance problems.

BR Peter

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.