I’ve searched through the documentation pages as well as here in the forum, but couldn’t find any indication on how many active shards are recommended.
From what I understand though, the higher the number of active shards the more resources are allocated (slowish performance of search queries?!).
Currently I only have 1 graylog server (1 node) and 1 elasticsearch server… the following settings are configured in my graylog server.conf
However, graylog say I have currently 24 active shards. My index says I have 6 indices with a total of 1,107,432,833 messages under management, current write-active index is graylog_5.
So far I’ve been using the pre-configured settings in the server.conf … is it recommended or am I going to run into a bottleneck in terms of performance and cluster-health any time soon?
Speaking from limited experience - if your indexes are staying online and you are happy with your search performance I would leave it as is. Your settings say how much data to keep and not how long to keep it, so once you are “full” it Graylog will begin deleting old indices.
How long did it take you to get to 1,107,432,833?
Given you are at graylog_5 currently, you should be able to store roughly three times the amount of data you currently have. As time passes it will create more indexes until you reach graylog_21, then it will delete the older indexes to keep your index count at 20.
At your current settings your ES data should grow until you have 20 indices on disk and approximately 20000000 documents per index. The size of each index will vary depending on the kind of data you are sending in (larger log records vs smaller log records make a difference).
thanks for getting back to me and taking the time to explain this to me.
It took me roughly 3 weeks, the earliest log entry is from May 8th.
I just had to change the settings to the rotation period to P1D and the max number of indices to 91, because we want to keep the logs for 90 days.
Not really sure if this was a good idea since we currently have only 1 node (due to limited resources). That said the graylog and elasticsearch server are pretty beefy.
What worries me is that we currently have no replica since we only have 1 node.
At this point I cannot say how it will scale after changing the number of indices to 91 days in terms of search performance and stability of the whole environment.
The elasticsearch server has 1.4TB of storage, whilst 542GB have been used so far.
Obviously search performance will decrease once you go back in time as it needs to search over various indices (but that isn’t my biggest worry at the moment, stability is though).
Elasticsearch will stop writing data at the 80% mark of disk space that it has available. That is configurable, but out of the box it wants to prevent issues that occur when your disks get full. When you get to around 1.1T of used space you may see Elasticsearch log messages about disk usage and it will stop writing log data.
Search performance may degrade a little as you get more data, but ES is really good at the job it does. We have almost a year of data in the can, and can still query at fair speeds.
Replicas - it is helpful to have replicas various reasons, but they add a bit of complexity to your configuration as well. Each open index (including replicas) consume a small amount of cpu/memory on your Elasticsearch servers.
With regard to stability - ES wants memory available to work and is sometimes not the best judge of when it will run out. We use a plugin for Elasticsearch called “hq” that does a good job of showing you what is happening in Elasticsearch. You will want to look at this field “% Heap Used:” to see how your ES server is doing.
Because also the elasticsearch-hq plugin indicated that the elasticsearch server is swapping.
In the plugin I also see that the refresh rate of the Index activity is 13.98ms
I assume this value is not ideal and probably has to do with the documents being stored on NAS storage rather than local disks. Does the refresh rate correspond directly with search performance (or displaying dashboards) ?
I have noticed throug the elasticsearch-hq plugin that my refresh rate is currently around 78.18ms (definitely above the threshold level for a warning)
I’ve tried to set the refresh_interval to 15s in the elasticsearch.yml on the elasticsearch server but that only increased it. Am I on the safe side when leaving it at the default value which is 1s?
Apart from that my node looks to be in a healthy state.