We’ve been using Graylog for a while for various things and found it to be a great product. However we’re now trying to build a data archive for one of our systems, which will keep 14 days of data (read off Kafka topics) for a total of about 5TB (1100 Million Messages) and I feel like I’m a bit out of my depth with sizing and optimisation of my indices.
Currently what I have (due to budget more than anything) is a single machine with an i5 Quad Core, 14GB RAM, 3 SSDs in RAID0 for total of about 600GB storage. I’ve given Elasticsearch 6GB heap and Graylog 2GB and everything is working nicely at about 1000 messages a second (but not much retention obviously).
I’ve got 3 index sets and have configured each Index to roll over at 1GB (which works out at about 100000 messages) and to delete after I reach 100 indices (per set). Right now I’ve got 1 shard per index and no replicas.
I guess my first question is, do I have a hope of just throwing more SSDs in this machine (and increasing the number of old indices I keep) and it being able to cope with 14 days of data?
My second question is whether I’m even close to the right parameters with 1 shard per index and 1GB index size?
I’ve done a lot of reading on Elasticsearch and Graylog but not found any real answers unfortunately. Other than coming to the conclusion I might be being a bit optimistic about this single machine being able to cope with the amount of data I want to keep.
Oh and I know I’m running a massive risk running a single machine with RAID0 but for the moment I have a very limited budget.
Any guidance (or telling me I’m deluded) is much appreciated.