Journal issues / sizing

hi,

Yesterday I had some minor issues in my Graylog cluster when users started to report there are no logs since 3pm. I started to investigate, but I couldn’t find anything interesting in the logs, so I’ve restarted both Graylog and ES and I found there are 7m+ unprocessed messages. This morning, processing is finished and everything looks fine, but when I check node stats I can see this message “Couldn’t get journal information”, but other than this, everything looks fin, ES GREEN, logs are coming at 2K rate

My version is
graylog-server-3.2.2-1.noarch

This cluster has two nodes (Graylog+ES and ES) each with 8GB RAM and 1 TB not-too-fast-storag plus 4 cores each.

As I wrote earlier, I have a sustained ingress rate at 2K to 4K and the total size of indexes are half terabytes.

So, my question are two-fold

  • how can I fix this journal metrics issue (I already increased the size a bit)
  • is my cluster has enough resources to serve this traffic? is there any sizing guide?

Thank you
Laszlo

he @vladx

this is a known issue. It will be fixed in 3.2.3

Your monitoring system should check it, not you. Monitor your envirolment.

I think this is something else.
#there is no reverse proxy, accessing Graylog on port 9000 using non-localhost address
#earlier -before the upgrade and/or issue- it was ok, so I was able to check journal utilisation from the Web UI

Any other idea?

Thanks
Laszlo

he @vladx

sorry that I did not point directly to the PR that fixes the issue. This above linked issue was the “starter” to identify the problem:

the above is the fix.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.