GrayLog Crashing

Hi,

Recently I started getting errors with graylog.

The server runs for a while (3/4 hours) busily collecting data and then all of a sudden it will just crash and gives the follwing error

Error message
cannot GET https://graylog.xxx.xxx.xx/api/system/cluster/node (504)

I then have to restart the whole server with “sudo graylog-ctl restart”.

I’m not really sure how to go about troubleshooting the issue. I have double the server ram but that doesn’t seem to have helped much.

When looking at “sudo graylog-ctl tail server” I get a number of
“WARN : org.graylog2.shared.rest.resources.ProxiedResource - Unable to call http://x.x.x.x:12900/system/metrics/multiple on node <22dcf9ff-fc50-4f83-a69e-94259b862541>, caught exception: timeout (class java.net.SocketTimeoutException)”

Any suggestion on how to debug this issue would be great.

did you checked the available disk space of your OVA installation?

I’m running 3 servers:
The front end has 77% free
The 2 Elastic search servers have 20% free disk space.

you might want to check the elasticsearch server logfiles - I guess that they send you low-watermark warnings.

The logfile location can be found in: http://docs.graylog.org/en/2.4/pages/configuration/file_location.html#omnibus-package

Hi Jan,

Yup. previous log files show the low-watermark warning but not in the next 10 days or so but the server still manage to crash.

I suppose I’m having a hard time working out which of the services is the root cause:
-nginx
-graylog server
-mongo
-elastic search

you should describe what happens exactly and not only “it crashes” - the more details you deliver what you do, what then happens, what error you have will enable someone to help you.

Hi Jan,

In the top of this thread I tried to outline as much detail as to what was happening: .
e.g.

  • Our web interface shows the error (attached in the screenshot)
  • The message says “we are experiencing problems connecting to the graylog server running on . please verify that the server is healthy and working correctly”
  • When I run look at the tail of the logs the main warning that occurs several times is: " WARN : org.graylog2.shared.rest.resources.ProxiedResource - Unable to call http://x.x.x.x:12900/system/metrics/multiple on node <22dcf9ff-fc50-4f83-a69e-94259b862541>, caught exception: timeout (class java.net.SocketTimeoutException)"

I also worked out that restarting just the “graylog-server” with
sudo graylog-ctl restart graylog-server
brings things back to life.

I don’t really know what other information would help in diagnosing the issue.

  • what Graylog Version did you use?
  • did you checked if the system is overloaded?

the version is 2.0.
How do I check if it’s overloaded?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.