we are using the open source Graylog edition for a couple of years and are totally happy with it.
2 Loadbalancers → 2 Graylog Servers → 4 Elastic Search nodes
Graylog version: 3.0.1+de74b68
Unfortunately, we were hit by a strange problem today:
Our gui access and everything around api-access to the default port 9000 is extremly slow. We also get socket-timeouts in the server.log
The incoming log messages seem to be processed ok as far as we can tell. Also, the sheer amount of log messages doesn’t seem to slow it down because we experience the same problem when no messages are pouring into the system.
The elastic search cluster seems to be ok, the mongo db seems to be ok, the DNS resolution is working, there is enough disk space, memory and the load on the system is minimal. These things were mentioned by other people in the forum.
We are running out of ideas…
Has anyone else a hint for us how to work towards a solution. We have no clue what’s going on.
Thanks in advance,