Alerts/Events do not trigger

It looks like you’re triggering an edge case where the process status is ignored, because
the ingestion rate is too low.

Can you try starting a Random HTTP message generator input see what happens then?

I started the Random HTTP message generator with the following settings:

sleep: 25
sleep_deviation: 30
throttling_allowed: false

The log still says the same even while the Random HTTP message generator is running :frowning:
2019-09-16T13:28:13.015+02:00 DEBUG [EventProcessorExecutionJob] Event processor <Errors found/5d5a43d2eb7809e1f9a969e3> couldn’t be executed because of a failed precondition (retry in 5000 ms)

How does graylog echo 'db.processing_status.find()' | mongo look like now?

This is how it looks while the Random HTTP message generator is active:

MongoDB shell version v3.6.3
connecting to: mongodb://127.0.0.1:27017/graylog
MongoDB server version: 3.6.3
{ “_id” : ObjectId(“5d5a43d2e5b53d0b531ae32a”), “node_id” : “be631694-8032-4855-b255-c9183c38115c”, “node_lifecycle_status” : “RUNNING”, “updated_at” : ISODate(“2019-09-16T11:35:07.190Z”), “receive_times” : { “ingest” : ISODate(“2019-09-16T11:35:07.148Z”), “post_processing” : ISODate(“2019-09-16T11:35:07.148Z”), “post_indexing” : ISODate(“2019-09-16T11:35:06.411Z”) }, “input_journal” : { “uncommitted_entries” : NumberLong(0), “read_messages_1m_rate” : 0, “written_messages_1m_rate” : 0 } }
bye

Ok, so this didn’t change. Maybe something is up with the metrics. Investigating…

It seems like it stopped completely since Friday. Before that we we’re getting some alarms but not always, actually most of the times they were not coming.

You are not running with a disabled journal, are you?

What do your node metrics look like for org.graylog2.shared.journal.KafkaJournal.readMessages ?

message_journal_enabled = false

Seems like I am. I was doing some tests on Friday switching it on and off as some people on the github issue said this should fix it.
Is the events system not working without message journal?

It is not, which is a bug.
The workaround I proposed helps for users which were running nodes in their cluster that don’t do any input processing at all. (Only there I meant to disable the journal)

Can you turn it back on again and report back?

I reactivated the message journal. Right after I restarted graylog-server it started processing lots of events (for the messages in the past) so we got a bit spammed by it.

After it finished triggering all the alarms for the previous messages it once again started to get into the exception: couldn’t be executed because of a failed precondition (retry in 5000 ms) - that happens while we didn’t have new messages (so I guess that is normal behaviour).

We got 5 events during 10 minutes whenever a message was coming in graylog (so it worked just fine). Then suddenly it stopped again, no more events even if I manually trigger some messages coming in.
The moment I start the Random HTTP message generator, events start working again - at least for a while then it dies again, then I start Random HTTP message generator and it works again and so on.

So is it by design that events will only work on a busy enough server? Our throughput is variable but more often than not it is quite low.

It is. that’s why we log these only in DEBUG

Kind of, the current default expects at least 1 message per minute.
I’m thinking about a fix though.

Hmm… There are quite a few intervals of time larger than a minute when we don’t get any messages. But when we get one it is important that we find out about it (using the notifcations). It would be very nice if a fix could be found, otherwise the entire events system would be unusable by us :frowning:
Fingers crossed for a fix.

Heay

I installed graylog v3.1.1+b39ee32 there is no event created

Using

MongoDB v4.0.12
elasticsearch v6.8.3
OS: debian92

I enabeled journaling by message_journal_enabled = true
And restart server and did not helped
I have one node and sending following gelf:
curl -XPOST http://myip:12201/gelf -p0 -d ‘{“short_message”:“Hello there”,"_Response_code":400, “host”:“example.org”, “facility”:“test”, “_foo”:“bar”}’
i just run :
echo ‘db.processing_status.find()’ | mongo
and output is :
root@logsrv01:/home/hh# echo ‘db.processing_status.find()’ | mongo
MongoDB shell version v4.0.12
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“02bac4da-16eb-48fc-bd0b-85391a0530d4”) }
MongoDB server version: 4.0.12
bye

i can not see :
written_messages_1m_rate : is this the problem ?

output from :

root@logsrv01:/home/hh# echo ‘db.event_notifications.find()’ | mongo graylog
MongoDB shell version v4.0.12
connecting to: mongodb://127.0.0.1:27017/graylog?gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“447baf4c-8009-4ea7-88d7-164cb89c3f86”) }
MongoDB server version: 4.0.12
{ “_id” : ObjectId(“5d78a317d2ab0e0e0c6b96c7”), “title” : “Response 200 Notification”, “description” : “Response 200 Notification”, “config” : { “type” : “email-notification-v1”, “sender” : “teknik@ankiro.dk”, “subject” : “Graylog event notification: {event_definition_title}", "body_template" : "--- [Event Definition] ---------------------------\nTitle: {event_definition_title}\nDescription: {event_definition_description}\nType: {event_definition_type}\n— [Event] --------------------------------------\nTimestamp: {event.timestamp}\nMessage: {event.message}\nSource: {event.source}\nKey: {event.key}\nPriority: {event.priority}\nAlert: {event.alert}\nTimestamp Processing: {event.timestamp}\nTimerange Start: {event.timerange_start}\nTimerange End: {event.timerange_end}\nFields:\n{foreach event.fields field} {field.key}: {field.value}\n${end}\n${if backlog}\n— [Backlog] ------------------------------------\nLast messages accounting for this alert:\n${foreach backlog message}\n${message}\n${end}\n${end}\n”, “email_recipients” : [ “hh@ankiro.dk” ], “user_recipients” : } }
{ “_id” : ObjectId(“5d78fda6d2ab0e1e1e95f5ae”), “title” : “Notification 400”, “description” : “Notification 400”, “config” : { “type” : “email-notification-v1”, “sender” : “teknik@ankiro.dk”, “subject” : “Graylog event notification: {event_definition_title}", "body_template" : "--- [Event Definition] ---------------------------\nTitle: {event_definition_title}\nDescription: {event_definition_description}\nType: {event_definition_type}\n— [Event] --------------------------------------\nTimestamp: {event.timestamp}\nMessage: {event.message}\nSource: {event.source}\nKey: {event.key}\nPriority: {event.priority}\nAlert: {event.alert}\nTimestamp Processing: {event.timestamp}\nTimerange Start: {event.timerange_start}\nTimerange End: {event.timerange_end}\nFields:\n{foreach event.fields field} {field.key}: {field.value}\n${end}\n${if backlog}\n— [Backlog] ------------------------------------\nLast messages accounting for this alert:\n${foreach backlog message}\n${message}\n${end}\n${end}\n”, “email_recipients” : [ “hh@ankiro.dk” ], “user_recipients” : } }
{ “_id” : ObjectId(“5d7901c4d2ab0e1e1e95fa2b”), “title” : “Event 600 notification”, “description” : “event &00 notification”, “config” : { “type” : “email-notification-v1”, “sender” : “teknik@ankiro.dk”, “subject” : “Event 600 Subject”, “body_template” : “— [Event Definition] ---------------------------\nTitle: {event_definition_title}\nDescription: {event_definition_description}\nType: {event_definition_type}\n--- [Event] --------------------------------------\nTimestamp: {event.timestamp}\nMessage: {event.message}\nSource: {event.source}\nKey: {event.key}\nPriority: {event.priority}\nAlert: {event.alert}\nTimestamp Processing: {event.timestamp}\nTimerange Start: {event.timerange_start}\nTimerange End: {event.timerange_end}\nFields:\n${foreach event.fields field} {field.key}: {field.value}\n${end}\n${if backlog}\n— [Backlog] ------------------------------------\nLast messages accounting for this alert:\n${foreach backlog message}\n${message}\n${end}\n${end}\n”, “email_recipients” : [ “hh@ankiro.dk” ], “user_recipients” : } }
bye

root@logsrv01:/home/hh# echo ‘db.scheduler_triggers.find()’ | mongo graylog
MongoDB shell version v4.0.12
connecting to: mongodb://127.0.0.1:27017/graylog?gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“70e1b068-6834-4c35-ac62-4c9b7e0d1ae5”) }
MongoDB server version: 4.0.12
{ “_id” : ObjectId(“5d777d4bd2ab0e05cdd0d8de”), “job_definition_id” : “5d777d4bd2ab0e05cdd0d8dd”, “start_time” : ISODate(“2019-09-10T10:39:07.114Z”), “end_time” : null, “next_time” : ISODate(“2019-09-16T20:43:10.490Z”), “created_at” : ISODate(“2019-09-10T10:39:07.114Z”), “updated_at” : ISODate(“2019-09-10T12:59:28.342Z”), “triggered_at” : ISODate(“2019-09-16T20:43:05.478Z”), “status” : “runnable”, “lock” : { “clock” : NumberLong(0), “progress” : 0, “last_lock_time” : ISODate(“2019-09-16T20:43:05.478Z”), “owner” : null }, “schedule” : { “type” : “interval”, “interval” : NumberLong(60000), “unit” : “MILLISECONDS” }, “data” : { “type” : “event-processor-execution-v1”, “timerange_from” : ISODate(“2019-09-11T20:02:42.665Z”), “timerange_to” : ISODate(“2019-09-11T20:37:42.664Z”) } }
{ “_id” : ObjectId(“5d77a7a7d2ab0e0e0c6a8612”), “job_definition_id” : “5d77a7a7d2ab0e0e0c6a8611”, “start_time” : ISODate(“2019-09-10T13:39:51.152Z”), “end_time” : null, “next_time” : ISODate(“2019-09-16T20:43:08.476Z”), “created_at” : ISODate(“2019-09-10T13:39:51.152Z”), “updated_at” : ISODate(“2019-09-16T16:13:22.944Z”), “triggered_at” : ISODate(“2019-09-16T20:43:03.471Z”), “status” : “runnable”, “lock” : { “clock” : NumberLong(0), “progress” : 0, “last_lock_time” : ISODate(“2019-09-16T20:43:03.471Z”), “owner” : null }, “schedule” : { “type” : “interval”, “interval” : NumberLong(60000), “unit” : “MILLISECONDS” }, “data” : { “type” : “event-processor-execution-v1”, “timerange_from” : ISODate(“2019-09-12T08:54:08.755Z”), “timerange_to” : ISODate(“2019-09-12T09:54:08.755Z”) } }
{ “_id” : ObjectId(“5d78a36cd2ab0e0e0c6b9727”), “job_definition_id” : “5d78a36cd2ab0e0e0c6b9726”, “start_time” : ISODate(“2019-09-11T07:34:04.722Z”), “end_time” : null, “next_time” : ISODate(“2019-09-16T20:43:10.496Z”), “created_at” : ISODate(“2019-09-11T07:34:04.722Z”), “updated_at” : ISODate(“2019-09-12T09:10:04.918Z”), “triggered_at” : ISODate(“2019-09-16T20:43:05.480Z”), “status” : “runnable”, “lock” : { “clock” : NumberLong(0), “progress” : 0, “last_lock_time” : ISODate(“2019-09-16T20:43:05.480Z”), “owner” : null }, “schedule” : { “type” : “interval”, “interval” : NumberLong(1000), “unit” : “MILLISECONDS” }, “data” : { “type” : “event-processor-execution-v1”, “timerange_from” : ISODate(“2019-09-11T08:04:40.726Z”), “timerange_to” : ISODate(“2019-09-11T08:09:40.725Z”) } }
{ “_id” : ObjectId(“5d7901d5d2ab0e1e1e95fa41”), “job_definition_id” : “5d7901d5d2ab0e1e1e95fa40”, “start_time” : ISODate(“2019-09-11T14:16:53.082Z”), “end_time” : null, “next_time” : ISODate(“2019-09-16T20:43:10.490Z”), “created_at” : ISODate(“2019-09-11T14:16:53.082Z”), “updated_at” : ISODate(“2019-09-11T15:33:46.095Z”), “triggered_at” : ISODate(“2019-09-16T20:43:05.479Z”), “status” : “runnable”, “lock” : { “clock” : NumberLong(0), “progress” : 0, “last_lock_time” : ISODate(“2019-09-16T20:43:05.479Z”), “owner” : null }, “schedule” : { “type” : “interval”, “interval” : NumberLong(1000), “unit” : “MILLISECONDS” }, “data” : { “type” : “event-processor-execution-v1”, “timerange_from” : ISODate(“2019-09-11T16:08:34.095Z”), “timerange_to” : ISODate(“2019-09-11T16:09:34.094Z”) } }

root@logsrv01:/home/hh# echo ‘db.processing_status.find()’ | mongo graylog
MongoDB shell version v4.0.12
connecting to: mongodb://127.0.0.1:27017/graylog?gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“a6b5fe48-ccf4-435f-bc22-19adc43cbd07”) }
MongoDB server version: 4.0.12
{ “_id” : ObjectId(“5d72786f125fefc15ea1f45e”), “node_id” : “ef886ea0-70e4-4419-ab19-85ca6fd5457c”, “node_lifecycle_status” : “RUNNING”, “updated_at” : ISODate(“2019-09-16T20:43:40.824Z”), “receive_times” : { “ingest” : ISODate(“2019-09-16T20:14:19.734Z”), “post_processing” : ISODate(“2019-09-16T20:14:19.734Z”), “post_indexing” : ISODate(“2019-09-16T20:14:19.734Z”) }, “input_journal” : { “uncommitted_entries” : NumberLong(0), “read_messages_1m_rate” : 0, “written_messages_1m_rate” : 0 } }
bye
root@logsrv01:/home/hh#

root@logsrv01:/home/hh# echo ‘db.event_processor_state.find()’ | mongo graylog
MongoDB shell version v4.0.12
connecting to: mongodb://127.0.0.1:27017/graylog?gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“dfea987e-9aea-44db-b9d9-0d45cff92570”) }
MongoDB server version: 4.0.12
{ “_id” : ObjectId(“5d778ab25bac10a9f8139dbe”), “event_definition_id” : “5d777d4bd2ab0e05cdd0d8dc”, “max_processed_timestamp” : ISODate(“2019-09-11T20:36:42.664Z”), “min_processed_timestamp” : ISODate(“2019-09-10T10:40:42.663Z”) }
{ “_id” : ObjectId(“5d78a4d95bac10a9f82babe4”), “event_definition_id” : “5d77a7a7d2ab0e0e0c6a8610”, “max_processed_timestamp” : ISODate(“2019-09-12T09:53:08.754Z”), “min_processed_timestamp” : ISODate(“2019-09-10T19:49:08.749Z”) }
{ “_id” : ObjectId(“5d78a6a25bac10a9f82bdee9”), “event_definition_id” : “5d78a36cd2ab0e0e0c6b9725”, “max_processed_timestamp” : ISODate(“2019-09-11T08:29:33.724Z”), “min_processed_timestamp” : ISODate(“2019-09-11T07:19:04.722Z”) }
{ “_id” : ObjectId(“5d7957015bac10a9f8425063”), “event_definition_id” : “5d7901d5d2ab0e1e1e95fa3f”, “max_processed_timestamp” : ISODate(“2019-09-11T16:09:33.094Z”), “min_processed_timestamp” : ISODate(“2019-09-11T15:32:46.094Z”) }
bye
root@logsrv01:/home/hh#

root@logsrv01:/home/hh# echo ‘db.scheduler_job_definitions.find()’ | mongo graylog
MongoDB shell version v4.0.12
connecting to: mongodb://127.0.0.1:27017/graylog?gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“8b86a0cf-4ca9-4066-9078-d08c61581b75”) }
MongoDB server version: 4.0.12
{ “_id” : ObjectId(“5d777d4bd2ab0e05cdd0d8dd”), “title” : “First event request 500”, “description” : “First event request 500”, “config” : { “type” : “event-processor-execution-v1”, “event_definition_id” : “5d777d4bd2ab0e05cdd0d8dc”, “parameters” : { “type” : “aggregation-v1”, “timerange” : { “type” : “absolute”, “from” : ISODate(“2019-09-10T12:24:28.339Z”), “to” : ISODate(“2019-09-10T12:59:28.339Z”) }, “streams” : , “batch_size” : 500 }, “processing_window_size” : NumberLong(2100000), “processing_hop_size” : NumberLong(60000) } }
{ “_id” : ObjectId(“5d77a7a7d2ab0e0e0c6a8611”), “title” : “Event 400 11-09”, “description” : “event 400”, “config” : { “type” : “event-processor-execution-v1”, “event_definition_id” : “5d77a7a7d2ab0e0e0c6a8610”, “parameters” : { “type” : “aggregation-v1”, “timerange” : { “type” : “absolute”, “from” : ISODate(“2019-09-16T15:13:22.942Z”), “to” : ISODate(“2019-09-16T16:13:22.942Z”) }, “streams” : , “batch_size” : 500 }, “processing_window_size” : NumberLong(3600000), “processing_hop_size” : NumberLong(60000) } }
{ “_id” : ObjectId(“5d78a317d2ab0e0e0c6b96c8”), “title” : “Response 200 Notification”, “description” : “Response 200 Notification”, “config” : { “type” : “notification-execution-v1”, “notification_id” : “5d78a317d2ab0e0e0c6b96c7” } }
{ “_id” : ObjectId(“5d78a36cd2ab0e0e0c6b9726”), “title” : “response 200 event”, “description” : “response 200 event”, “config” : { “type” : “event-processor-execution-v1”, “event_definition_id” : “5d78a36cd2ab0e0e0c6b9725”, “parameters” : { “type” : “aggregation-v1”, “timerange” : { “type” : “absolute”, “from” : ISODate(“2019-09-12T09:05:04.854Z”), “to” : ISODate(“2019-09-12T09:10:04.854Z”) }, “streams” : , “batch_size” : 500 }, “processing_window_size” : NumberLong(300000), “processing_hop_size” : NumberLong(1000) } }
{ “_id” : ObjectId(“5d78fda6d2ab0e1e1e95f5af”), “title” : “Notification 400”, “description” : “Notification 400”, “config” : { “type” : “notification-execution-v1”, “notification_id” : “5d78fda6d2ab0e1e1e95f5ae” } }
{ “_id” : ObjectId(“5d7901c4d2ab0e1e1e95fa2c”), “title” : “Event 600 notification”, “description” : “event &00 notification”, “config” : { “type” : “notification-execution-v1”, “notification_id” : “5d7901c4d2ab0e1e1e95fa2b” } }
{ “_id” : ObjectId(“5d7901d5d2ab0e1e1e95fa40”), “title” : “Event 600”, “description” : “event 600”, “config” : { “type” : “event-processor-execution-v1”, “event_definition_id” : “5d7901d5d2ab0e1e1e95fa3f”, “parameters” : { “type” : “aggregation-v1”, “timerange” : { “type” : “absolute”, “from” : ISODate(“2019-09-11T15:32:46.094Z”), “to” : ISODate(“2019-09-11T15:33:46.094Z”) }, “streams” : , “batch_size” : 500 }, “processing_window_size” : NumberLong(60000), “processing_hop_size” : NumberLong(1000) } }
bye
root@logsrv01:/home/hh#

Heay

sorry that it took so long to answer.
Something happened and the notification now gets triggered … like a lot. But i do not see an event created in the ui.

here is the log with enables debug:

Thanks

Thanks. This is the EventProcessors catching up on past events.
This might take a while, because right now it’s done in 1 minute chunks at a time.
You should see the events if you search for a greater time range.
Your logs showed created for the 12th of September.

I’ve created two more issues regarding event processing.
Let’s assume you were experiencing one of them :slight_smile:


Are you running with journaling disabled?
Also, please update to 3.1.2 :slight_smile:

I updated to 3.1.2 the same problem
in my server.conf i have folowing
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal

Journal hold messages before they could be written to Elasticsearch.

During normal operation the journal will be smaller.

#message_journal_max_age = 12h
#message_journal_max_size = 5gb
#message_journal_flush_age = 1m
#message_journal_flush_interval = 1000000
#message_journal_segment_age = 1h
#message_journal_segment_size = 100mb
thanks

How many messages do you ingest on this node on average?
It seems it’s mostly idle.
Could you try to run this with a http random message generator input enabled?