Service behavior

Greetings everyone!

I started an EC2 instance on AWS (16GB initially) with all the necessary stack: mongoDB, ElasticSearch and Graylog server.
I added an Input plugin for AWS Cloudtrail, with a configured SQS queue, all right.

However, the storage consumption was very fast, in a few minutes I had problems with that.

Having reported all this, my doubts are:

  • Is this the expected behavior?

  • If not, how should it be?

  • How do I qualify the storage dimension?

  • Is there a recycling of data or does storage consumption only increase over time?

Help would be greatly appreciated.

@wexllen

I see that you have 16GB for storage?

Have you seen this?

https://docs.graylog.org/en/4.0/pages/getting_started/planning.html

2 Likes

Basically what @gsmith said. To me, this reads more like you were surprised by the amount of logs that were generated by Cloudtrail and didn’t quite prepare for that volume. So let’s try and walk through your questions

It’s not clear what your problems were, aside from the storage being consumed quickly. The title you added to the post is “Service behavior,” so I’m assuming you saw some issues around Graylog/Elasticsearch dying, or not processing messages because your disk filled up. Let me know if that’s not an accurate read. Assuming that it is, then yes–Graylog/Elasticsearch are wholly reliant on being able to have large swaths of disk available to them. So in the absence of a place to store the messages, then yes–it’s expected that Graylog won’t be able to process the messages.

That’s honestly something that you’re going to have to figure out for yourself. I’m under the assumption that going into deploying Graylog, you didn’t have a solid understanding of how much data you’d be ingesting–that’s totally fine. I’m not sure if you arrived at this part in the doc that @gsmith linked, but there’s this calculation for getting a close estimate:

A simple rule of thumb for planning storage is to take your average daily ingestion rate, multiply it by the number of days you need to retain the data online, and then multiply that number by 1.3 to account for metadata overhead. (GB/day x Ret. Days x 1.3 = storage req.).

So you’re going to have to let Graylog run for several days to get a solid idea of the amount of logs that something like Cloudtrail is generating. From there, you can decide how much storage you’ll actually need, as well as determine if you have any sort of business requirement to keep as much data as it generates. What I mean is, do you need to have all the fields? Can you drop some parts that are superfluous? If so, then you can reduce your storage need after the fact by figuring out just what you need to keep.

Graylog will archive indices once they’ve reached your specified criteria (e.g., number of messages, days of retention) and you can elect to move those archives off into cold storage where they’re not needed immediately. You could also elect to have Graylog delete data after it’s been archived. All that to say, it can be a situation where storage will grow over time–that’s up to you as the system owner to figure out how you’ll handle an ever increasing storage need. It could also be the case that through managing your retention periods, you end up with a solid feel for how much data you actually need and provided that you’re not onboarding a ton of new services, applications, or continually adding sources, your storage requirements could be fairly modest and not grow. But again, that’s up to you to figure out how you want to handle your retention strategy and all that comes with it.

Hopefully this helps.

3 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.