We did a PoC on a shared Graylog frontend and Elasticsearch cluster. Alerts worked fine.
We stood up a dedicated Graylog frontend against the same Elasticsearch cluster. I’m not sure what all steps were taken, but the person doing the install/config did somehow migrate data from our PoC. Users, streams, etc. all carried over.
One of the old alerts worked earlier today, then at some point today, it simply stopped, despite the conditions being triggered repeatedly.
No new alerts seem to want to work. Stuff as basic as “more than 0 messages in the last 1 minute” when the stream has dozens of entries, and we’re not in a grace period, is failing to trigger.
No obvious errors in log. Nothing obvious gained from turning logging up to trace. Tried restarting Graylog, no impact.
System
Node ID:
95344e6f-ac75-4ce9-bb90-34cb054f2e87
Version:
2.1.3+040d371, codename Smuttynose
JVM:
PID 3593, Oracle Corporation 1.8.0_60 on Linux 3.0.101-91-default