I’ve just noticed (back from vacation) that our Graylog installation is not showing any message after a certain date and time. Thinking that maybe something was stuck (even if there were no clues) I’ve reloaded the Graylog process and, since this has not solved anything, the whole server. But the problem persists, and I’ve noticed that log messages are still being received by Graylog. It stopped showing messages after the main index have been rotated:
SystemJob <35019d70-1509-11ec-a31d-000c29d977e7> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob] finished in 4039ms.
SystemJob <476fc900-1509-11ec-a31d-000c29d977e7> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob] finished in 1051251ms.
Now I can indeed see that the default index set has a new active index, that “Contains messages up to a few seconds ago” but this data is not appearing in any dashboard or search. What can I do? Any advice?
I’ve also noticed that both the Process and the Output buffers are full, and also I have millions of unprocessed messages in the disk journal. I can’t really understand what happened and, most important, how to make it working again.
I’ve just increased the number of CPUs from 4 to 8 “just in case” but seems that this didn’t make any difference.
Yes, thats usually happened when your Elastic or graylog server has not have enough resources.
unfortunately, this is not the case. It’s using less that 20% of the total CPU available, and less than 50% of the RAM. I was also checking the extractors metrics but I can’t find anything interesting.
Please consider that I’m the only one that manages this server, I left this server working fine three weeks ago and today I’ve discovered this issue that is ongoing since 1 week, so I can safely say that no one has created/changed anything in between.
I’ve just checked that the last message I can see is indeed inside the most recent index of the index set, so I maybe it’s not related with the index rotation.
What can I do to understand what’s stuck in the buffers?
CPU is usually is not problem, but disk I/O in ElasticSearch. How many logs per second (day) do you store in Graylog, and what is your disk configuration (SSD?)
I’ve solved the issue thanks to this: Index rotation failure - #5 by tobiasreckhard
TL; DR: “ES put all indices into read-only mode when the system’s disk space fell below the low watermark”. So, even if Graylog has not complained about disk space, Elasticsearch did. After increasing the available space it’s now working fine again.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.