We have the dashboard setup in Graylog called Mishawum:
We had created a couple of events and set up notifications for our servers both Mishawum/ Revere to send alerts when an error count reaches a particular threshold but we wouldn’t receiving alerts in an expected manner. It seems like there is some issue with the Graylog application.
- Test Alert - Mishawum Error count hits threshold
- Test Alert - Revere Error count hits threshold
- Mishawum/ Revere Error Alert
And also observed that getting alerts for mishawum sometimes and that is also not at the exact time when an event rule is matched but after a long time (day after, 2 days after, etc…). As per my understanding, we may or may not receive alerts even though our message passed through the event rule/s.
For example, we defined an event to send alerts when the mishawum error count reaches 30. Getting alerts a day later or whenever Graylog sends by saying that the count is in the range of some 30-38 (but when looking at the dashboard actual count is somewhere between 400-450) after that not receiving any alerts even though there is a delay in the process.
Due to this, we couldn’t able to identify when our servers are down or having some issues.
Please let me know what’s causing the issue here? and also I’m seeing a lot of open incidents about the Graylog alert system.