Graylog cluster indexer failures after reboot graylog master


(Kornflex) #1

Hello,

I have a HAPROXY and behind 3 servers with : graylog, mongodb, ES
srv1 is the graylog master

Srv1 was busy, a lot of unprocessed message.
So, I reboot srv3, after, I reboot srv2 to add RAM and CPU because I’m in dev mode.
All is OK, nodes comes back.

But when I upgrade srv1, graylog says that there is no master. Normal.
After he comes back, I add some indexer failures with this error : primary shard is not active

What’s wrong ?

It’s because graylog had a lot of unprocessed messages or because ES config is not good ar something else ?

Thank you


#2

after a restart elastic needs to restore the replicas (the data what your wrote in ES until it were down). But the Elastic start a new shard, and copy the full insted of copy the changes. Based on the shards’ size it can take time.
Maybe you restart the ES nude until it wasn’t synced to the other servers.

Check your elastic cluster status. System- indices-your replica set or via ES API.


(Kornflex) #3

Thank you, I will check that


(Kornflex) #4

How can I delete indexer failures ?

I try in mongodb : db.index_failures.remove({ index : “graylog2”}) for example, but, I try my indice created in graylog, still the same :confused:

Thank you


(Jan Doberstein) #5

did you want to delete the information about index failures from the mongo db or what is the goal?


(Kornflex) #6

I’m a beginner with mongodb, ES and graylog, so I don’t really know what to do.

I’m in dev with my cluster, so right now, it doesn’t matter, I can delete them from mongodb and loose data.
Later, I can’t loose something.

I’m trying to create a cluster but I don’t really know how it works : 1 or 2 master nodes for ES, only 1 master Graylog, how to manage mondobd replicat etc… So I’m playing with it right now to understand these things.

If you have good articles, I’m interrested


(Tess) #7

For that one I can heartily recommend M001 and M103 at Mongo University. I started taking their classes because I’d started using Graylog :slight_smile:


(Jan Doberstein) #8

why you want to create a cluster - what is complex - when you do not know the three components.

IMHO it would be better for you to run a single machine with everything on it first to learn Graylog - then when you a little more into that, understand what component does what you can create a production ready cluster.

Needed skills are:

  • MongoDB Administration
  • Elasticsearch Administration
  • Linux & Networking knowledge to connect and debug the environment

(Kornflex) #9

I have graylog single serveur since 2 years but we need to use this to keep logs and to be GDPR proof :slight_smile:


(Tess) #10

My ${DEITY} @jan! Why did you never tell me that Graylog is listed in GDPR documents as a requirement? If only I had known before! :slight_smile:

Anywho @kornflex: you will need to learn very quickly about your infrastructure. Very similar to this other thread:

These guys have also run into issues with their production environment without really know how to troubleshoot the situation. I will repeat what I said to OP in the other thread: learn how your environment is built, know what each part does and how it’s supposed to function.

In your case we can say that ElasticSearch is broken in some way, because one of the shard storage hosts is apparently not accessible.


(Tess) #11

By the way, I would like to applaud you @kornflex, for building a DEV environment to mess around with instead of tinkering with the production environment. Not many people actually make that effort, so :+1:


(Ben van Staveren) #12

Indeed, +1 from me as well. The best way of doing it is to set up a dev environment, then break it, then rebuild it different, break that, rebuild it again, then have that “a-ha” moment, destroy it, build a new dev environment, throw a load of traffic at it. Then do it again, in production, with your previous experience (and documentation… even if they’re just scribbles on a napkin) to back you up.

That’s how we went from Graylog-in-dev to Graylog-in-mission-critical-production in a week. Minus the notes on a napkin because I’m one of those types who doesn’t take his own advice >.>