Performance advice. I'm missing something

charlie · January 30, 2019, 5:46am

Hi All,

Apologies for the long post, but I’ve attempted to put in as much information as possible

Setup:

8x vCPU @ 2.6ghz
50GB memory
500GB SAS disk (Raid 10 on an IBM v7000 SAN)

I’ve got everything on one box, and I’m aware that it’s not idea, but this is just for a test. The only issue I have at the moment is that I can’t get more than 150(ish) msg/s going out. So far, I have:

Changed the graylog server.conf file to

output_batch_size:5000
processbuffer_processors: 75 (tried 6, 8 & 20 previously)
outputbuffer_processors: 75 (tried 6, 8 & 20 previously)

Changed ES:

To have 31GB of memory in the jvm.options
Disable swapping, which I verified with: curl -X GET “localhost:9200/_nodes?filter_path=**.mlockall”

The odd thing that despite have millions of messages queued, I can’t get the VM to work hard at all

The CPU (total across cores) barely peaks at 20% and the disk write isn’t very high either

Anything else I should be looking at? Currently I’m shipping ASA Syslogs and IIS logs via NXLog, which I don’t think are too complex? Extractors attached below:

https://pastebin.com/GxrVXin5
https://pastebin.com/W2Tam7As

TLDR: Process buffer is full, output buffer looks like it’s doing next to nothing

jan · January 30, 2019, 7:07am

having Graylog and ES fight for the same ressources does not make it easier.

With your given ES HEAP Elasticsearch will eat all available RAM - because it takes 31GB from the available 50GB waht left 19GB. Then the OS Filesystemcache of Lucene tries to occupie another 31GB making your memory consumption to -12GB - adding the default 1GB HEAP of Graylog we are at -13GB RAM. Now the OS do also need some RAM. As I did not know how you added the SAN (multipath?) that might eat some additional RAM to handle that all. what will eat over all ~15GB more RAM than you have.

Now to the Cores - you have 8vCPUs, with HT? For a GL only System your configuration should be something like Inputbuffer_processors 2, Processbuffer_processors 4, Outputbuffer_processors 2 - But as ES is very distinct about ressources it SEES in the system available it will try to occupy them all for itself. Unless you restrict it with configuration settings. So you have GL and ES fight for the available cores …

With your given processor configuration in Graylog you have configured that Graylog can process with 75 cores and push with 75 connections the same time up to 5000 messages a second to Elasticsearch.

Where are the ressources that Elasticsearch is able to eat the ingested messages?

macko003 · January 31, 2019, 12:32pm

one more thing.
If the GL can’t write to ES the output buffer should at full with process buffer.
But your picture shows only full process buffer. So GL haven’t enought resource.
use the default or @jan 's offered values. You need to do performance optimalization over about 5k/s in server.conf, until that use default values.

Do you use pipelines? lookup tables? Streams with many rules? Extractors?
Under metrics, you can find about 2500+ metrics about your system. And under the features also. Check where the message processing too slow.

benvanstaveren · January 31, 2019, 4:33pm

Never set the processors so high. Ideally you want the sum of all processors to be equal to the number of cores in your machine. Given that you run everything on a single machine, you’re exhausting CPU right off the bat. Don’t forget that ES would like some too

Try setting processbuffer_processors to 4, outputbuffer_processors to 2 and leaving it at that, with a batch size of 4096. If it doesn’t go any faster, then you’ve reached the limit of what that machine can actually do - don’t forget that indexing is a heavy operation on the ES side and takes CPU and disk IO, and I think you’re basically swamping it.

charlie · January 31, 2019, 9:44pm

I understand what you’re saying, but if the resources were being consumed all at once, wouldn’t I see high CPU consumption and no memory free?

Apologies for this bit, I missed a key piece of information. The host is a VM. I’ve upped the memory to 80GB for the machine so I shouldn’t have any conflict.

Basically, I’d be best to create two VMs, one for Graylog, one for Elastic for this test?

charlie · January 31, 2019, 9:49pm

I read something similar on all the forms about the CPU. I have to admit, I posted this just after I set the settings to 75. Interestingly, taking a look this morning, I can see that I’m now processing between 300 & 400 messages per second on the output with around 1200 on the input (peak)

So the setting did do something positive I guess? I saw a similar thing on reddit and on the forms here too:

https://www.reddit.com/r/sysadmin/comments/9khkke/graylog_processing_messages_super_slow/

The thing that bugs me is that I can’t get the CPU to peak anywhere near 100% (I’d be satisfied if I could see the CPU or memory pegged)

charlie · January 31, 2019, 10:03pm

I don’t have any pipelines or lookup tables yet. I posted my extractors further up in pastebin. I’ve looked in metrics, but it doesn’t mean too much to me (I’ll have to read the manual on that bit)

I’ve changed the cores back to Jan’s recommendation for now

Edit: Having changed the CPU down, it’s now at 44 msg/s in the output

charlie · January 31, 2019, 10:17pm

Ok, I think I may have found something. My ASA extractor regex appear to be the problem. Those maximums look quite high?

Edit: Deleted those extractors, still stuck on the process buffer

charlie · February 1, 2019, 6:39am

Ok, it turns out that the application logs are quite complex and can’t run in parallel with the syslogs from the network devices for some reason. if I stop all app logs (low volume) and let all the network devices ship to the UDP input everything works fine and there’s no queue. Turn them on at the same time and the process buffer fills up. It’s odd that it can’t process in parallel (do I need separate output buffers?)

Anyway, for now, we’ll just create a separate graylog instance for network devices.

Totally_Not_A_Robot · February 1, 2019, 6:50am

That’s an odd situation! I think I read somewhere how each input would have its own thread on the CPU. Of course, after that the question is how things go with pipelines and processors vs threads. You’d expect some parallelism in there.

charlie · February 1, 2019, 6:58am

Yea, certainly not what I expected. However in saying that, I am using the latest 3.0 release. Could be something in their tickets already?

The extractors I put further up didn’t seem too bad, but I think I’m right. I basically sat with the node screen open and I could see it dump 9000 messages in one second, then drop back to 100. What I believe to be Cisco syslogs going quick, before going back to slow application logs. Combine that with stopping the app inputs and seeing syslogs going smoothly too

Two instances isn’t a bad result though!

jan · February 1, 2019, 8:17am

take in mind @charlie that you have two JAVA Applications running in a Java Virtual Machine - you will not by default see high CPU usage or similar.

When it comes to performance it is always nice to know where your are and where you are coming from … your througput indicates that something is wrong because I can make on my i3 NUC a constant flow of 1k messages …

Please return everything to the defaults and starting from that tune the system carefully. What did you see using the defaults?

What Versions of Graylog and Elasticsearch did you use exactly and what ressources are available to them. What kind of storage does your ES have? It matters how fast that is.

charlie · February 4, 2019, 3:50am

I’ve currently got each of them at:

Process buffer: 6
Output buffer: 2
Input buffer: 2

Elastic memory: 31GB
Swapping disabled (mlockall)

Graylog: 10GB JVM

VM itself: 8x vCPU, 80GB memory, 500GB SAS

jan · February 4, 2019, 6:34am

perfect tuning - if Elasticsearch does not need any CPU ressources … but when you start to ingest, it needs some.

With the Graylog configuration you are good to go with that settings and the amount of CPUs but as Elasticsearch by default would like to have all available cores this is not a good configuration.

Totally_Not_A_Robot · February 4, 2019, 7:31am

Well that, and between Graylog and Elastic’s heaps and caches, you’re not leaving much space for the OS in RAM 2*31 + 2*10 > 80GB RAM.

charlie · February 5, 2019, 3:47am

I think I need to write better regular expressions or learn how to use pipelines and lookup tables etc. I have another node in another data center which I upgraded to 3.0 today. When I shipped a bunch of logs via NXLog into it, I could see good performance:

I think my busier node which this thread started about had a few complex operations that queued the buffer.

Either way, separating network from applications seems to have done the trick for my busier sites. I’ll attempt to build a multi node cluster now and see if I can bring everything back together

charlie · February 5, 2019, 3:48am

I’m not short on memory, so I can always bump these servers to 150GB if need be. May give it a go…

macko003 · February 5, 2019, 8:29am

Maybe a better idea, to monitor and understand the problem, and after that if all things show memory problem, you can increase your memory. Until that I think its only waste of resource.
Of course it is a faster way if you go ahead.

benvanstaveren · February 5, 2019, 11:35am

You’re probably better off at the moment to set up some sort of metrics on memory usage, etc. etc. and seeing where the “pain point” lies - throwing more memory at it may not solve the initial issue

As far as regular expressions go, try to at least anchor them (with ^ and $) which greatly speeds up the matching. For example if you have the string “aaaa bbbb cccc dddd aaaa 1111” then doing a regular expression of, say, a{4}.* to get any string starting with aaaa will match that, but it will require the regex executor to go through the entire string. If you change it to ^a{4}.* it will still match, but it will stop as soon as that’s sorted out.

If you take strings like Connection received from 1.2.3.4 on port 25 and you want to extract the IP with a regular expression (or a Grok pattern which in essence becomes a regular expression), the “best” performance can be had with ^Connection received from %{IP:src_ip} (as grok pattern).

Pipelines are similarly fiddly - you want to be very specific in the “when” clause for any function to very selectively decide what to work on, and not “waste” too many CPU cycles on trying to parse things that don’t want to be parsed

Totally_Not_A_Robot · February 7, 2019, 8:24am

I keep clicking on that like button, but all it does is make the little heart beat… *5.

Topic		Replies	Views
Status Green, all systems go. How to optimize? Graylog Central (peer support)	16	4995	September 8, 2017
Championing Graylog and need performance advice Graylog Central (peer support)	10	4173	September 14, 2017
Graylog Cluster, Buffer process 100% stop process messages Graylog Central (peer support)	22	17180	November 28, 2018
Process Buffer Flooding 100% process Graylog Central (peer support)	8	4840	May 7, 2020
Process and output buffers are full Graylog Central (peer support)	19	10153	November 30, 2020

Performance advice. I'm missing something

Related topics