We had an old instance of Graylog and everything ran on a single node with the following main features: 32 GB RAM, 1.1 TB for the ElasticSearch data.
That old node was capable of handling a lot of load just fine. The best part was: the data storage disk used by ES was always under control and we never needed to extend it to accommodate more data:
/dev/dm-1 1.1T 613G 414G 60% /var/graylog
I deployed a new Graylog instance some time ago consisting of 2x Graylog servers + 3x OpenSearch servers.
They also handle the load pretty well. My only issue is that I must extend the data disks for OpenSeach on a regular basis. It started with 1.5 TB and nowadays they are:
you are pushing in up to 600GB a day, 15TB in the last 30 days. I can’t see the sizes of those indexes but the bottom two only rotate once a month and keep 12 so if those are large you are keeping that data for at least a year. So from just those data points, ya I would guess you are burning through a ton of disk space.
Can you tell me how configure graylog to stay data 1 year?
Can I keep graylog data “forever”? Does it only depend on the disk size?
Can you help me to make a encrypt configuration?
The length of time data is stored for is governed by the index settings. An index (where the data is written in opensearch) has two key settings, how often it is rotated (retured from being written to to being read only) and retention (how many of those old versions to keep around).
Rotation can be based on multiple things, but for this use time is easiest. So if we set the rotation to every 1 day, and retain the last 90 we now have 90 days of data. There are a million advanced topics on this, but that is the basics of it.
You can read all about it here Index model
For encryption you would either want to look at OS disk encryption, or see if opensearch has some encryption options, Graylog doesn’t have encryption of data as a built in feature.
When I told you to the encryption was about comunications encryption between agents and graylog server, not about hard disk of data.
I have to read indices documenation a lot.
Thanks mister. If you want to tell me something please don´t doubt to do it.
Different applications use different indices and streams. Therefore, they also have different rotation/retention times, some of them must comply with rules and regulations for auditors and all that jazz…
My idea was to send rotated indices to some cold storage but I read that this option is only available in the enterprise version …
My poor-man’s approach was to take full snapshots of the master node and send them to a cold storage disk, reduce the rotation times for the indices and use the cold-storage as some sort of point-in-time restore… Would that work?
I am just thinking about in the same today (all the last week).
I am learning all graylog documentation (opensearch too) and we have to know that opensearch´s snapshot are incremental so if you delete a old snapshot and you try to restore a snapshot less old that the snapshot deleted , the operation will not work because opensear use the previos snapshots to make the new snapshots and it only copy de different data, the same data stays in the old snapshot, it is used to not use a lot of hard disk space.