Graylog Keeps Losing Leader

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:

Every early morning, Graylog loses its leader

2. Describe your environment:

  • OS Information:
    All environments are Debian 12 within a LXD container hosted by PVE
  • Package Version:
    Graylog 5.1.5
    Mongodb 6.0.10
    opensearch 2.9.0
  • Service logs, configurations, and environment variables:

All server.confs are the same with the following difference:
is_leader = true applies to grayserver-1 /
is_leader = false applies to grayserver-{2,3} / 192.168.1.{2/3}

My setup:

3. What steps have you already taken to try and solve the problem?

I restart graylog-server.service and it resolves itself

4. How can the community help?

I suspect that this is occurring because, as mentioned above, the nodes are hosted in a ProxMox environment and that I have ProxMox take a snapshot every night at or around midnight. However, before I remove that level of security, I wanted to ask the experts to ensure I am barking up the right tree.

Hey @accidentaladmin

I do have Proxmox but have not used snapshots, I started using FOG

To be honest , I also think it has something to do with Proxmox. Personally I would montor the metric on all the nodes in the clusters during that time. I use Zabbix or what every you have.

You be able to see whats going on better, just an idea. Someone else had a similar issue with there Opensearch cluster while back something to do with there manager nodes/ data nodes in Opensearch forum.

Since you have cloudflare there should be something under Analytics & Logs and also check out caching there to, if you havent done that all ready.

Thank you for the breadcrumbs!

Fog looks interesting; I’ll check it out!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.