Index size on Elasticsearch


(Alfredo) #1

Hi all.

We have 2 Graylog node with 2 Elasticsearch nodes for the index/search etc. We would like to keep all the index on elk for at least a month.
How can I calculate that from Graylog prospective? Any clue/tips will be appreciated.

Also can I share an external NFS filesystem between the nodes?

​Cheers​


(Pedro Miguel Pereira Serrano Martins) #2

As far as I know, I don’t see why you wouldn’t be able to save data for more than a month. Have you read something that indicates otherwise?

How can I calculate that from Graylog prospective? Any clue/tips will be appreciated.

Do you mean searching data that is 1 month old? If so, have a look at the search documentation:

http://docs.graylog.org/en/2.4/pages/queries.html

What is your objective with this?
From the architecture reads I understand Garylog uses MongoDB and ElasticSearch to save and query data respectively.
NFS is not mentioned anywhere.


(Alfredo) #3

Hi there.

Yes…of course we are able to save more data but I wonder how to calculate that so we can size the disk for the index properly.

So…yeah you right I need to search index for more than a month. SO I ll read the doc you sent me

I was thinking NFs as our logs are quite big and the rate at the moment is 10Mb worth of data every couple of mins so the nodes get file system full.

Cheers


(Pedro Miguel Pereira Serrano Martins) #4

Yes…of course we are able to save more data but I wonder how to calculate that so we can size the disk for the index properly.

This would be your MongoDB cluster. Here you would store all the DATA you want without affecting anything.
Your system would redirect the data to the MongoDB cluster where it would be saved, not affecting your other nodes.

As for searching fast, your ElasticSearch machines should be strong in memory and your Graylog nodes strong in CPU power. A read of the overall architectural considerations should help you:

http://docs.graylog.org/en/2.4/pages/architecture.html

If you still think NFS is the solution, then I can’t really help you more. You will have to wait for someone else to answer.


(Alfredo) #5

Thanks Pedro. Appreciated. I have also red this

Also keep in mind that ingested messages are only stored in Elasticsearch. If you have data loss in the Elasticsearch cluster, the messages are gone - except if you have created backups of the indices.

so not problem with Graylog CPU and ELK memory. The only issue is the data for the index as both elk nodes keep filling up and I don’t have a big volume to attach just yet.

That’s why I was thinking of an external NFS to save a worth of a month of data.

Thx


#6

If you’re planning on running a cluster of 2 elasticsearch nodes, you should read up on split brain scenario and either go with a minimum of 3 elasticsearch nodes in a cluster. What is your expect EPS, what types of events, and how much parsing are you planning on doing?

also, if you’re planning on using NFS as the writable space for elasticsearch, don’t bother. Elasticsearch will not be able to startup because it can’t get the proper locks on the system. Instead use something like iSCSI or fibrechannel. If you wanted to use NFS for elasticsearch snapshots or graylog enterprise archiving, that will work


(Alfredo) #7

thanks.

We ended up with 2 iSCSI disks so it should be ok now. About the amount to log we fixed for now having a worth of 4 weeks logged.

Thanks for your help.


(system) #8

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.