Performance Scaling

danmassa7 · November 10, 2021, 4:18pm

Scaling Up

We are finishing up a Graylog pilot and want to kick this into high-gear with a full-blown production install. We are currently running Graylog and Elasticsearch on one VM:
4 CPUs and 16 GiB of total DRAM.
4GiB of RAM for Elasticsearch
4GiB of RAM for Graylog

We’re going to move it onto more powerful hardware where I can provision (almost) as many VMs, RAM and CPUs as I like. We’re currently doing < 1500 messages per second and the above hardware is handling that volume fine. It’s about 50 GB of logs per day. But I’d like to quadruple that volume with the new setup.

I’m assuming that the recommendation will be to put Graylog and Elasticsearch on separate VMs. But should I use two or more ElasticSearch VMs? How much RAM/CPU should I give the ElasticSearch VMs?

I know the actual answer is “it depend”. But does anyone have a starting point I could use?

Thanks!

Graylog 4.0.7
MongoDB 4.2.14
Elasticsearch 7.10.2

danmassa7 · November 10, 2021, 4:24pm

Here’s an example. I’ve read that you shouldn’t give Elasticsearch more than 32GB of RAM because it disables java oop compression and hurts performance. I’ve also read that you shouldn’t allocate more than 50% of the VM’s RAM to Elasticsearch because you need to leave room for the OS-level disk cache. Does that mean that a single “max” Elasticsearch VM would be something around 64GB of DRAM with 26GB of DRAM (32 minus some safety margin) for the Elasticsearch JVM?
Thanks.

tmacgbay · November 10, 2021, 7:04pm

There is lots of information in the DOCS about planning your environment, other sections talk about multi-node setup and lots of other posts in the community you can search such as this one that will give you more information about building out. A search on “scaling” will point you to a bunch more. For Elastic specific information on sizing… well that would be in Elasticsearch documentation/community…

gsmith · November 11, 2021, 3:39am

I would defiantly look at what @tmacgbay suggested for starters. There are some real good ideas.

Just to give you idea what we did for collecting log from over 300 remote devices (trial and error) we came up with some steps creating a large cluster.

How much log/s will be ingested per hour, day, etc? That would pertain to what kind of resources you would need. See if you can get a ruff idea how much log per day and work from that. We have 30 GB per day over 60 Remote devices with one Graylog Server VM. I have 14 cores, 12 GB Ram and 500 GB HDD. Below is a message count per day.

What type of logs will be Ingested? This would pertain to what type of INPUT and log shippers to be acquired. Depending on the type you decide to use some INPUT/s left uncheck like GELF for windows will produce a lot of fields so you will see your volume fill up. This can be control from the client side also.
How long do you depend on retaining log? ( week, months, years) this, as you know depend on storage resources .
What type of devices will be send Logs (Linux, Windows , Switch, etc…) This would pertain to how many INPUTs you might want to use. If you decide to add extractors to INPUTS this will increase some of your resources needed.

Over all when you finish setting up your cluster and before sending ALL logs to Graylog Server. I would advise to start slow, maybe 1/3 of your remote devices and work you way up. This will give you the opportunity to break in Graylog and see how it functioning as a cluster. I have seen others just over whelm Graylog server (message storm) then state “My server Crashed”,

If you start to notice problems, stop because you may need to increase you resources and/or reconfigure you server.confg file. Such as buffers filling up, Volume filling up, Heap filling up, etc… You just don’t know about the other little details until logs start to roll in. By going slow you can catch these issues and start to adjust before it becomes a big problem. Once you cleared up any issues start again by sending more logs till your finished.

If your increasing your volume by x4 then having 3 Elasticsearch nodes separated form you 3 Graylog/MongDb node would be good. You can always expand your cluster/volumes if need be.

Since your using Virtual Machine that’s Ideal and as you know Its very easy to add resource to a VM.

Hope that helps

danmassa7 · November 11, 2021, 10:19pm

We’re doing 58M messages per day. Our department is Network Engineering, so all of our messages are syslog.

I found a couple of people in our department that have some ELK stack experience. I’ll combine some of the advise above with their expertise and see where we land. Thank you for your suggestions!

system · November 25, 2021, 10:20pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog Sizing/Optimzation Graylog Central (peer support) sidecar , nxlog , nodatanx	4	6498	December 15, 2017
Performance Tuning Whitepaper, Guide, Doc Graylog Central (peer support)	5	4857	August 8, 2017
Assistance Required: Enhancing Graylog Efficiency for Huge Log Volumes Graylog Central (peer support)	2	84	July 3, 2024
Regarding tuning Graylog Cluster Graylog Central (peer support) access-specific-log- , architecture , components	4	1066	March 1, 2023
Performace Problems Graylog Central (peer support)	5	570	December 6, 2019

Performance Scaling

Scaling Up

Related topics