Cost of graylog setup for production (10 GB/day)

Hi, we are planning to use graylog, with a expecting 10 GB/day load. We think we should install graylog following the “Multi-node setup” for production guide, with a retention period of 6 months, older should be archive in a secondary storage. But we are not sure about the price of having this installation running in AWS 24/7. Also we are wondering how bigger the instances should be(medium, large, xlarge) in order to support our expected load. I would appreciate if somebody can throw some light on this.

Thanks.

Please contact sales for an authoritative offer: https://www.graylog.org/contact-sales

Definitely have a call with Graylog sales/engineer to talk about your specifics.

I’m not sure what you were thinking in terms of “multi-node” but, seperating the Elasticsearch server from the Graylog server does not constitute a multi-node deployment, more of a distributed deployment, but I recommend it because serperating the 2 down road, I understand to be much more difficult than simply adding another graylog node and/or another elasticsearch node. So if you’re thinking of separating the 2 for performance purposes, simply follow the single node installation and install Java, Mongodb, and Graylog on your front end/ingest server, and then install Java and Elasticsearch on the Elasticsearch server. Then modify the server.conf file to tell Graylog where the Elasticsearch server is located.

10 GB/day ingest is a boundary where the specs for the server increase per Graylog guidelines, (Sales provided me the spec sheet and I’m sure they’ll provide it to you as well) but as a ballpark, for the Graylog front end, think about 8 cores and 8-16GB of RAM, and for elasticsearch, 8-16 cores and as much memory as you spare. 16-32 GB is about right.

As far as hard drive. if you are ingesting 10GB/day, and want to retain it for 6 months, rough calculations would be about 1.8TB of storage 180 days * 10GB/day. So round up to 2TB. SSD would be awesome, but at 10GB, 10k or 15k RPM HDD will work fine, assuming it’s part of a SAN or RAID. The archiving piece will allow you longer term storage and can be be compressed as well. It also can just be a standard network file share that resides on another server.

I started looking into having this on AWS/Azure, but there were a whole bunch of aspects I didn’t have the time to iron out, so I built mine locally and can always migrate down the road.

Hope that helps, Graylog sales is your best bet and they will be able to give you specific guidance.

you should always include this in your calcs:

From the above link:

TIP: Small shards result in small segments, which increases overhead. Aim to keep the average shard size between a few GB and a few tens of GB. For use-cases with time-based data, it is common to see shards between 20GB and 40GB in size.

TIP: As the overhead per shard depends on the segment count and size, forcing smaller segments to merge into larger ones through a forcemerge operation can reduce overhead and improve query performance. This should ideally be done once no more data is written to the index. Be aware that this is an expensive operation that should ideally be performed during off-peak hours.

TIP: The number of shards you can hold on a node will be proportional to the amount of heap you have available, but there is no fixed limit enforced by Elasticsearch. A good rule-of-thumb is to ensure you keep the number of shards per node below 20 to 25 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600-750 shards, but the further below this limit you can keep it the better. This will generally help the cluster stay in good health.

It all depends on your ingest - if you boost or deliver over the day. How many people use the search and how well the data is seperated into different fields.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.