Yesterday I had some minor issues in my Graylog cluster when users started to report there are no logs since 3pm. I started to investigate, but I couldn’t find anything interesting in the logs, so I’ve restarted both Graylog and ES and I found there are 7m+ unprocessed messages. This morning, processing is finished and everything looks fine, but when I check node stats I can see this message “Couldn’t get journal information”, but other than this, everything looks fin, ES GREEN, logs are coming at 2K rate
My version is
This cluster has two nodes (Graylog+ES and ES) each with 8GB RAM and 1 TB not-too-fast-storag plus 4 cores each.
As I wrote earlier, I have a sustained ingress rate at 2K to 4K and the total size of indexes are half terabytes.
So, my question are two-fold
- how can I fix this journal metrics issue (I already increased the size a bit)
- is my cluster has enough resources to serve this traffic? is there any sizing guide?