Message retention


(Jiri Kolb) #1

Hello,
I would like to store logs for one year then delete. Are following values correct?

According to ISO 8601 one day is P1D is so rotation period set to:P1D
Select retention strategy set to:365
Select retention strategy set: delete

Can you please also explain to me the difference between close and deletion of retention strategy?

Thank you!

Jiri


Estimating disk size
(Philipp Ruland) #2

Hello @colbik,

The configuration you provided means that Graylog should create a new Index for every day and keep them for 365 days (you create a new index every day, you will hit the retention point after 365 indexes, to be more explicit).

The difference between Delete and Close is explained here..

The following index retention settings are available:
Delete: Delete indices in Elasticsearch to minimize resource consumption.
Close: Close indices in Elasticsearch to reduce resource consumption.
Do nothing
Archive: Commercial feature, see Archiving.

Delete obviously means that the data will be deleted from disk while a closed index will reside on disk but will not be accessible and therefore won’t need resources to be kept in memory.

Needed disk space is hard to estimate, since it depends on the data you input. The larger and more numerous it gets, the more space you will need. As some values, my private Graylog (logging speed tests and a PiHole DNS server) has about 40MB per 140.000 messages/day, while my test setup at work (Firewalls, Switches, Routers, Load Balancers) has about 20 - 25 GB for 16.000.000 - 25.000.000 messages/day. (The values are estimates. I’ll update the value to correct ones tomorrow :slight_smile: )

Greetings - Phil


(Jiri Kolb) #3

Hello,
We have very simple setup just one Fortigate firewall with estimation of 200 EPS that is 288000 events per day so maximum size is approximately 100MB per day? Would be great if you can provide values from your environment. Regarding indexes deletion that is pretty straitforward that index is deleted from db. I read the links but if you can explain to me in more practical way what is close index. Is information in close index available for search? Indexes that are closed are not stored in RAM (does it have relation to RAM estimation)?

Thank you very much!

Jiri


(Philipp Ruland) #4

Disk Usage

Well, as @jochen already said in your other topic:

I looked at our Fortigate input and the messages were about 380 - 450 bytes in size. (So lets assume 415 bytes per message for the calculation (Keep in mind that you probably need some more if you parse your message into many fields).

  365   * (200 * 60 * 60 * 24) *        415 byte        *           (1 + 0)          *       1.5      = 3926232000000  bytes / 1099511627776 =  3,5708 TiB ( 3,9261 TB) per year // Just one ElasticSearch Node
  365   * (200 * 60 * 60 * 24) *        415 byte        *           (1 + 1)          *       1.5      = 7852464000000  bytes / 1099511627776 =  7,1417 TiB ( 7,8523 TB) per year // Standard 2-Node-Cluster
  365   * (200 * 60 * 60 * 24) *        415 byte        *           (1 + 2)          *       1.5      = 11778696000000 bytes / 1099511627776 = 10,7126 TiB (11,7786 TB) per year // 3-Node-Cluster
  365   * (200 * 60 * 60 * 24) *        415 byte        *           (1 + 3)          *       1.5      = 15704928000000 bytes / 1099511627776 = 14,2835 TiB (15,7048 TB) per year // And so on ...
 [days] *  [messages per day]  * [average message size] * [primary + replica shards] * [magic number] = result

(1099511627776 is the factor for bytes to Tebibytes, in brackets are actual Terabytes (calculated with 1000 instead of 1024)) As you see, for more replicas it just scales by the number of replicas, this should be obvious.

Breaking this down, you would need 3,5708 TiB (3,9261 TB) / 365 = 10,0178 GiB (10,7564 GB) per day.

Closed Indexes

In short, closed Elasticsearch indexes just reside on disk. They are not accessible for search, since they won’t be loaded into memory for search operations. See it as a step in between active data you use and inactive data you archived on another system. It is not accessible like active data, but can be re enabled with an API-call instead of having to copy data from an archive system.

Greetings - Phil

PS: I guess you miscalculated. 200 EPS calculated to a day is 17280000, not 288000. You calculated 200 * 60 * 24 which means 24 minutes. :slight_smile:


(Jiri Kolb) #5

Thank you, yes I miscalculated :slight_smile: