Alerts - Notifications

hello everyone,

Our alerts are never up to date with Graylog 3.3.8, centOS 8, elasticsearch 6.8.13, mongodb 4.4.1
I see in /var/lib/graylog-server/journal/graylog2-committed-read-offset
36
In /var/lib/graylog-server/journal/recovery-point-offset-checkpoint
0
1
messagejournal 0 37

We are monitoring some logs with very few errors. Alerts are always lagging behind with last errors scanned in past 10 seconds. We write ERROR 3 in log we get mail with previous ERROR 2 and so on.


image

No errors at all in logs. Any thoughts guys? Where we are getting it wrong… please help! :slight_smile:

Always behind with one step
Incoming message from 2020-10-29 12:49:49 +02:00 trigger event fom
2020-10-29 12:11:50.497 and so on and so on


Anyone with the same problem?

Your time window is too small. You are searching every ten seconds over the previous ten seconds. It takes time for an event to move from collection, through processing in Graylog, then into Elasticsearch, where the alert query can then search for it. It is not unusual for that process to take as long as 30 seconds or longer, depending on the Elasticsearch refresh interval setting.

The time stamp, however, will be applied during the period of time when it was passing through Graylog. If the timestamp is added to the message at second 1, then it takes 15 seconds to get from the Input on Graylog into Elasticsearch and another two seconds before Elasticsearch indexes and stores it, any search over the past ten seconds will fail.

Try expanding it to a search over the past one or two minutes. I think you’ll find all the alerts start to show up as expected.

Hey Chris! Thank you very much for your answer but it didn’t help. Changed to 1 min, then 2 min and it didn’t change the behavior. Restart of graylog-server and the hole machine also didn’t change nothing.
I have no errors in logs (graylog, sidecar, elastic, mongod) and it used to work in graylog 2.4 even with 10 second search/execute.
Any other suggestions - highly appreciated. Thanks again!

Looking at the time ranges involved, I notice that you are running the query at 17:46, but the next time range begins at 16:40:19. When a notification is sent, how far in the past is it reporting on?

I wonder if your processing is backed up? When you look at System/Nodes?Details, do you see either the processor buffer or journal filling up, or consistently above a single digit percentage?

When a notification is sent, how far in the past is it reporting on?
It reports just the last message that I wrote in the oracle database alertlog with: sql> exec dbms_system.ksdwrt(3,‘ORA-36’); For example: I write: exec dbms_system.ksdwrt(3,‘ORA-37’); then it will send me a mail with previous ORA-36 that I wrote earlier.

I wonder if your processing is backed up? When you look at System/Nodes?Details, do you see either the processor buffer or journal filling up, or consistently above a single digit percentage?
It is always at 0.00%. I installed it 3-4 weeks ago and I was testing with just one oracle database alert log where I or occasionally database is writing very few errors.


Actually Graylog works very fast even with 10 second search/execute (with or without backlog messages) but somehow events/alerts are lagging behind with -1.

CONTENT OF /var/lib/graylog-server/journal/messagejournal-0/00000000000000000119.log 1572/1572 100%
…w…t®.
…=6 …ë.Áªär.Ú…è…ë… 6=…ڞräªÁ.!.Âsxu…*-
.beats.${“source”:{“no_beats_prefix”:false}}2B
$e88d3431-4d88-4708-a01c-2c033908c523…5f92be1b03217d779a1cbeb6:

.À¨.X…´.BÉ.{"@timestamp":“2020-10-30T07:40:52.756Z”,"@metadata":{“beat”:“filebeat”,“type”:"_doc",“version”:“7.8.1”},“log”:{“offset”:539662,“file”:{“path”:"/u01/
app/oracle/diag/rdbms/dev/DEV/trace/alert_DEV.log"}},“message”:“ORA-38”,“input”:{“type”:“log”},“gl2_source_collector”:“223aa2xxxxxxxxxxxx258ad718”,"
collector_node_id":“oratest”,“ecs”:{“version”:“1.5.0”},“host”:{“name”:“oratest”},“agent”:{“version”:“7.8.1”,“hostname”:“oratest”,“ephemeral_id”:“da1ec79
b-e529-4a6c-882a-fb922d81b4a2”,“id”:“3adc7c78-b0fa-4f06-a8e8-128ad009bd89”,“name”:“oratest”,“type”:“filebeat”}}…x…æñ.>…[0…ë.Áªär.Ú…è…ë.
…0[…ڞräªÁ.!é.txu…*-
.beats.${“source”:{“no_beats_prefix”:false}}2B
$e88d3431-4d88-4708-a01c-2c033908c523…5f92be1b03217d779a1cbeb6:

.À¨.X…´.BÉ.{"@timestamp":“2020-10-30T07:41:42.778Z”,"@metadata":{“beat”:“filebeat”,“type”:"_doc",“version”:“7.8.1”},“message”:“ORA-39”,“input”:{“type”:“log”},“g
l2_source_collector”:“223aa277xxxxxxxxx258ad718”,“collector_node_id”:“oratest”,“ecs”:{“version”:“1.5.0”},“host”:{“name”:“oratest”},“agent”:{“ver
sion”:“7.8.1”,“hostname”:“oratest”,“ephemeral_id”:“da1ec79b-e529-4a6c-882a-fb922d81b4a2”,“id”:“3adc7c78-b0fa-4f06-a8e8-128ad009bd89”,“name”:“oratest”,“type
“:“filebeat”},“log”:{“offset”:540224,“file”:{“path”:”/u01/app/oracle/diag/rdbms/dev/DEV/trace/alert_DEV.log”}}}

Even if I see message ORA-39 in http://192.168.x.xx:9000/search an event/alert for this message is not triggered.

Thank you, Chris!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.