1. Describe your incident:
Periodically a Graylog pod will start to hammer our elasticsearch cluster with /_all/_alias GETS. This causes elasticsearch to consume a lot of threads and put it in an unstable position.
2. Describe your environment:
Kubernetes cluster with 35 Graylog pods and 1 master
ES 6.8.22 on elastic cloud.
Service logs, configurations, and environment variables:
3. What steps have you already taken to try and solve the problem?
Tried to find why these specific calls are made and if there’s a way to disable them
4. How can the community help?
I’m looking to understand why these specific /_all/alias calls are made. I also want to understand what would cause an instance to make hundreds of these GETS per minute. I understand Graylog making a GET call to /graylog*/_alias or other specific indexes but not /_all.
Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]
I must say that’s a lot of Graylog Pods, I assume this is a very large environment?
What Kind of security configurations were made on Graylog? Is it only the one Graylog Pod and is it the same one all the time? since there are 35 pods did you try to shut that pod down that making all these calls? If so did you notice any other ones executing the same calls?
I’m curious if this is an Isolated issue or some configuration issue.
It’s processing a few TB a day so it’s not small. During my investigation I found that a major increase in calls to /_all/_alias occurred when I had a tab open on the detailed view for an index set.
You can see a major increase from about 7/rps to just under 40/rps while this tab is open.
If your using chrome , what does developer Tools show?
EDIT: I’m just guessing here but I seen something a while back where Graylog send alias calls to create deflector aliases as indexes.
I think the resolve was this configuration
action.auto_create_index: false in elasticsearch config file.
I found it here…
@gsmith Im seeing a repetition of these calls in developer tools when on the index page.
Regarding your edit. Do you believe that would be happening even if it’s not creating those indexes? We haven’t had an issue with an alias being created as an index. I’ve also confirmed that this only happens when the UI is open and not during regular operations.
We’re not seeing that actually. Writes continue to work and the deflector is rotated according to our policy. The only oddity is this high rate of GETs to that alias endpoint. I also see GETS to /graylog_*/_alias at a reasonable rate without a corresponding spike.
At this point your going outside my range of knowledge. Perhaps @patrickmann could help here.
Sorry @davidwin93 unfortunately I haven’t had this issue , plus we don’t have a big setup like this with pod’s.
Thanks for your time @gsmith. I appreciate that you dug into this a bit.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.