Hi All,
I am facing one issue in my graylog Clutser. (1 master 2 slave, graylog version 2.3)
I came across journal utilization is full and it has lots of unprocessed messages (10 Million messages). I stopped all graylog instance and deleted journal directory. When i started i still see journal has unprocessed message (3 Million ).
Also inputs are running but i am not seeing messages in graylog
Can you please suggest how to fix this.
Hi @araj2
Journal is like a “swap area” used by Graylog to handle messages while the system is not able to write them in Elasticsearch where it has to be at the end of the process.
If you Journal is full, it means that Graylog can’t write messages in Elasticsearch for some reason.
You need to troubleshoot your ES Cluster and fix it in order to put Graylog back to the game.
@reimlima I am using AWS Elastic Search service (AWS managed service), its a 6 node cluster.
ES cluster is green since last 2-3 months, and this issue is happening since 17th April.
Can you please help what should i check in ES cluster
Hi @araj2
A Green ES Cluster means that your service is up and running properly, but it didn’t tells you everything.
From your Graylog Server, test if you can reach your AWS ES Cluster:
- Basic Ping command
- Telnet test to ensure if your server is able to bind ES Port
- In your Graylog UI, take a look at System > Overview if you have some notifications related to “watermark”
First two are related to the communication between your server and your ES Cluster, third one may indicate that your ES Cluster is running out of space somewhere and you have to perform actions to free some space where Graylog can write messages.
Hi @reimlima ,
I did telnet/ping from graylog server to ES and it worked.
In ES each node has 300-340GB space left. (Its 6 node ES cluster)
Ok, how about your Graylog UI, any notifications?
Another thing that can make this happen is Network Latency. If your Graylog is having to wait too long to write data, maybe the communication between your Graylog server and your ES cluster can be a bottleneck.
Try to find some errors related to that in your server log, perform a tcpdump between ends and see how communication goes between them.
Maybe considere migrate your ES Cluster to a closer AWS area.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.