this might seem to be an odd question
But is it possible to find out the size of raw log data that is currently being used (before it is ingested in the elasticsearch database)? With the help of the elasticsearch-hq or kopf plugin I can see that currently approx. 772gb are in the elasticsearch database. There seems to be an 1gb offset, because when I run “df -h” on the elasticsearch filesystem I can see that 773gb are currently used.
`Summary
Node Name: Gazer
IP Address: x.x.x.x:9300
ES Uptime: 9.04 days
File System
Store Size: 850.6GB
Documents: 1,725,085,132
Documents Deleted: 0%
Merge Size: 1.4TB
Merge Time: 23:34:52
Merge Rate: 5.8 MB/s
File Descriptors: 806
Disk space used: 60.8%
Disk space free: 548.1GB
Index Activity
Indexing - Index: 0.18ms
Indexing - Delete: 0ms
Search - Query: 21.99ms
Search - Fetch: 0.4ms
Get - Total: 0ms
Get - Exists: 0ms
Get - Missing: 0ms
Refresh: 52.14ms
Flush: 212.57ms
Cache Activity
Field Size: 1.9GB
Field Evictions: 0
Filter Cache Size: 0.0
Filter Evictions: 0 per query
ID Cache Size:
% ID Cache: 0%
Memory
Total Memory: 0 gb
Heap Size: 5.9 gb
Heap % of RAM: 0%
% Heap Used: 64%
GC MarkSweep Frequency: 0 s
GC MarkSweep Duration: 0ms
GC ParNew Frequency: 0 s
GC ParNew Duration: 0ms
G1 GC Young Generation Freq: 0 s
G1 GC Young Generation Duration: 0ms
G1 GC Old Generation Freq: 0 s
G1 GC Old Generation Duration: 0ms
Swap Space: 0.0000 MB
Network
HTTP Connection Rate: 0 /second`
currently having 68 shards and 17 indices (goal is to have a 90-days retention time)
the machine has the following specs: 10 vcores, 12GB memory and 1.5TB of storage
ah sorry, then I misunderstood you. I’m receiving logs since May 8th… so exactly a month today.
Don’t think I’ll be able to cover the 90-days retention period then