Graylog inputs stopped when k8s pod re-spawns after failure

I have a rare but annoying problem with GL running in k8s and wondering if anyone else has solved it.

My specific problem is this…

I have 2 GL server pods (GL v2.5) …one running as master/web UI, the other dedicated to data ingestion. On rare occasions the data ingestion pod has crashed and restarted but the inputs associated to that pod (all my inputs only go to that one pod) end up in stopped state after the pod has recovered because k8s changes the pod name dynamically when the new pod is spawned.

I have considered as a possible solution enabling both pods to ingest data, that way if only one pod crashes I can still ingest data on the other pod/server. (But I’d rather not change that piece of my design if I can avoid it.)

Or would it be sufficient to enable the “Global” option the input? Would that mean that any new server that appears in the cluster would automatically be assigned to that input? (essentially replacing the old node name which will have disappeared)

Alternatively maybe I can finagle k8s into presenting a static name to the cluster. I’d rather not mess with my k8s config if at all possible.

Thoughts?

Or would it be sufficient to enable the “Global” option the input? Would that mean that any new server that appears in the cluster would automatically be assigned to that input? (essentially replacing the old node name which will have disappeared)

yes - select global means exactly that.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.