Graylog UI stops responding after a while

Hi,

we have been running one Graylog server about a year, but recently UI stopped responding. Server is Ubuntu 18 running Graylog version 3.1.2. Server has two network interfaces. Environment has also proxy.

What happens is that Graylog server works normally max few minutes after service is started and then UI stops responding. Nothing special appears in the logs before or after UI hangs. Logs are collected without problem all the time. If you restart graylog-server service then it will respond again some minutes, which should exclude that problem is caused by proxy or some other external factor.

After UI stops responding, existing connections are ending up to CLOSE_WAIT state when taking a look with netstat. After hanging the UI, wget from localhost will show:

wget http://x.x.x.x:9000

Connecting to x.x.x.x:9000… connected.
HTTP request sent, awaiting response…

Any tips what could cause the problem or how to further debug?

We know the situation on an older version, seems to be running out of memory. Check it with top or htop.
Memory for the server can be set here: /etc/sysconfig/graylog-server

Other option is the arp cache that is flushed bij default after 300 sec.

Thanks for the tips, Arie. I increased max memory from 1gb to 4gb in /etc/default/graylog-server (-Xmx4g), but still same effect.

Does Graylog have it’s own arp cache? I checked the Ubuntu arp cache and it seems to contain valid entries.

It does not has it’s own cache, and 4GB is not what is needed for graylog. Here it’s at 2 GB.

Strange behavior, your problem seams to be network related. Anything in syslog or the graylog log?

Try clearing your dns cache;
sudo systemd-resolve - -flush-caches
and look at the stats; sudo system-resolve - -statistics

and try to connect

Try clearing your arp cache:
service nscd restart

and try to connect

Can you connect locally to the site with curl http://X.X.X.X:9000

Flushing caches did not help and stats show that they really were flushed. Logs have no indication of relevant errors or warnings. This really is a strange problem as we have also other systems running on same server (different ports) they are working fine. Therefore, problem is somehow related to Graylog, but haven’t been able to figure out how.

@rnh
Hello,

By chance were there any updates applied to your GL Server before this happened?

Are you running apache or Nginx?
Sometime after restarting the proxy service it seams to clean thing up, just an idea.

Hello

By chance were there any updates applied to your GL Server before this happened?

We have just installed standard Ubuntu server updates. Even Graylog server has not been updated. We needed to remove IPv6 support from host (in /etc/sysctl.conf), but they have been restored. I thing we noticed was that some update had messed up /etc/hosts file so that 127.0.0.1 was set to server name, not to localhost. There was no entry for localhost. We modified it back (127.0.0.1 localhost and other IP addresses actual server name). And caches have been flushed since, so naming should be fine.

Are you running apache or Nginx?

Actually proxy is not on the server, it is Squid on other host. On the server we have set no_proxy for local names and addresses. Same symptoms exist when we are using curl or wget from localhost and we have confirmed that requests are not going through proxy server. Therefore proxy should not be the problem. And again, other services are running fine and reachable via proxy and from localhost.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.