API-access extremely slow / timeouts

pstiffel · February 17, 2021, 4:42pm

Hi there,

we are using the open source Graylog edition for a couple of years and are totally happy with it.
Our setup:
2 Loadbalancers → 2 Graylog Servers → 4 Elastic Search nodes
Graylog version: 3.0.1+de74b68
Unfortunately, we were hit by a strange problem today:

Our gui access and everything around api-access to the default port 9000 is extremly slow. We also get socket-timeouts in the server.log
The incoming log messages seem to be processed ok as far as we can tell. Also, the sheer amount of log messages doesn’t seem to slow it down because we experience the same problem when no messages are pouring into the system.
The elastic search cluster seems to be ok, the mongo db seems to be ok, the DNS resolution is working, there is enough disk space, memory and the load on the system is minimal. These things were mentioned by other people in the forum.
We are running out of ideas…

Has anyone else a hint for us how to work towards a solution. We have no clue what’s going on.

Thanks in advance,
Patric

jan · February 19, 2021, 9:21am

he @pstiffel

if that runs for a longer time - you might want to take this blog post into account and check the amount of data you have:

pstiffel · February 19, 2021, 10:00am

Hi Jan,

thanks for that hint, I will look into this.
It’s kinda strange but when we stopped our efforts to find the problem, the cluster sort of healed itself and after 2 hours without our intervention, everything worked smoothly as before, the UI was responsive again.

Patric

jan · February 19, 2021, 1:17pm

he @pstiffel

that sounds like you have a cluster that can be pushed over the limit just by creating new indices or similar what is usually a sign that the meta data is over what should be done.

pstiffel · February 20, 2021, 12:27am

Hi @jan,

what do you mean with “the meta data is over what should be done”?

Patric

jan · February 22, 2021, 6:57am

if you have a specific hardware required you can push data up to some limitations to that. What you describe sounds like you are a little over what your resources can handle because the symptoms of overwhelmed systems are starting with the given information.

system · March 8, 2021, 6:57am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog api/system timeout failures Graylog Central (peer support)	4	2623	July 26, 2019
API timeout /system Graylog Central (peer support)	8	2403	February 19, 2022
Help in Better performance of my Elasticsearch-Graylog Setup Graylog Central (peer support)	11	4219	March 13, 2019
Graylog Processing Messages Super Slow Graylog Central (peer support)	3	4009	October 16, 2018
Problem with REST API timing out Graylog Central (peer support)	9	2668	June 22, 2017

API-access extremely slow / timeouts

Related topics