I’m running a Graylog cluster in AWS and I was hoping for some sizing/optimization advice.
We’re ingesting about 140 million messages a day at about 130 GB and that’s only going to grow.
I have 3 ES 2.3.2 hosts running on M3.xlarge. ES has a 9 GB heap on each host and is maintaining 5 shards, each with 1 replica.
I have 2 Graylog 2.2.0 hosts running on C3.xlarge. Graylog has a 2GB heap on each host, and is also running mongodb and td-agent to receive secure logging connections (we use the secure_forward plugin to log over ssl).
The main issue I’m having is that CPU and Memory are super stressed on the Elasticsearch hosts. I think this is because we’re undersized and are exhausting our resources.
I have two questions:
Are there any optimizations/configurations specific to graylog I could make to wring out more performance before sizing up?
Aside from not analyzing very fields in log messages and not throttling my indexes, I’m basically running ES out of the box.
What would be a more ideal spec for this farm other than “bigger”?
I found the Graylog Sizing Guidelines document and while I know Jochen described them as “bad”, they also seem to indicate I’m running at about half the recommended size, especlially Elasticsearch.
I can provide any other info about my setup that would help discussion.