this setup is slowly maturing, still, on every change i notice similar issues.
since beats is part of the road ahead i created an input for it and configured beats to send data
in addition a winlogbeat was installed on a laptop which had cached weeks of data due to changes in the network
i noticed graylog indicating burst of up to +3000 messages and expected eventually these would show up in a search ‘all messages’ filter, but not so.
are there typical reasons for an input not indexing messages ?
Hello,
first of all an input is for ingesting. Graylog provide outputs which are defaults to elasticsearch those are writing messages into the elasticsearch cluster.
So if a message can not writen into the index it appears in the Section “System Overview -> Indexer Failures last 24hours” under “Show Errors” you can see why a message is not written into the index behind a stream.
First you should look there. There is also reason clearly written. Most of the time the reason is a wrong data type for the document field. Those messages are lost.
In your case i fish out my magic classes. I guess at first the section mentioned above had no counters? If this was the case you should just adjust your timerange within the search cause the beat messages could ingest and proccessed with a timestamp correction. That means they are written into the index with log timestamp.
Just checked, the indexing errors are from 14 days ago and indeed typically due to mismatches etc. It is not like i’m completely new to elastic and graylog. GL just keeps on beating me with the same problems for which i see no real cause.
I remember this ‘timestamp thingy’ has surfaced before. It is truely cumbersome. My entire setup is set to a single timezone from laptop to server. I don’t see how i could not find events if i select ‘search in all messages’
If you go to your inputs and click “show received messages” on the input in question, is that blank also?
if you click System | Nodes | details, do you see anything being processed through the system? Journal filling up, input buffer, process buffer, output buffer… are any of those doing anything?
AH! output buffer appears to stall at 173 messages and does not move anymore
There was nothing out of the ordinary. I saw messages arriving on the input, messages being processed and so on. The messages stop with timestamp on september 15th. Since i see no new messages while the input counters keep on increasing.
Lately i noticed when i arrive on the inputs page the inputs are briefly shown as ‘not running’ then as ‘running’. I mention this because it may be an indicator i’m not aware, i dont’ remember seeing this before. In the past i submitted issue 9054 on the graylog github for a really annoying bug surfacing when visiting pages under system.
Looking on the graylog server the network traffic is as expected with network activity for the ports configured (input for beats, input for GELF, input for syslog)
yeah, i checked a lot of logs and saw a lot of errors which i mostly resolved be reconfiguring inputs etc.
The problem has not changed though. A considerable amount of log entries are missing because of some mismatch with the timestamp. I cannot retrieve the message-id extracted from the logs either.
Somehow the new logs arrive just fine while i’ve not reconfigured anything related to timestamps etc.
This timestamp issue should really be better documented. It is deeply annoying how ignorant i feel
i learned elasticsearch keeps a separate graylog.log which is flooded with DEBUG messages
graylog also has errors which eventually cause the ssh session to show garbled text.
Most messages here are from a GELF/TCP input with “Null frame delimiter” enabled. Disabling “Null frame delimiter” appears to make the garbled terminal situation go away.
Returning to assessing the situation.
1 - Elastic/graylog.log (chronology respected)
[DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [elk] failed to put mappings on indices [[[graylog_2/rR5AtZe6RA2S74QEFNxdQQ]]], type [message]
[DEBUG][o.e.a.b.TransportShardBulkAction] [elk] [graylog_2][0] failed to execute bulk item (index) index {[graylog_deflector][message][8c6066c0-079d-…], source[n/a, actual length: [2kb], max length: 2kb]}
2 - Graylog/server.log (a two hour ofset due to an unreachable time server on the client, fixed now)
2020-10-06T08:42:11.025+02:00 ERROR [DecodingProcessor] Unable to decode raw message RawMessage{id=0fb5a892-079f-11eb-8eba-1264c5baeb62, journalOffset=62143755, codec=gelf, payloadSize=368, timestamp=2020-10-06T06:42:10.969Z, remoteAddress=/10.9.1.6:5441} on input <5ef105d184645059c4026896>.
2020-10-06T08:42:11.027+02:00 ERROR [DecodingProcessor] Error processing message RawMessage{id=0fb5a892-079f-11eb-8eba-1264c5baeb62, journalOffset=62143755, codec=gelf, payloadSize=368, timestamp=2020-10-06T06:42:10.969Z, remoteAddress=/10.9.1.6:5441}
com.fasterxml.jackson.core.JsonParseException: Unexpected character (’;’ (code 59)): expected a valid value (number, String, array, object, ‘true’, ‘false’ or ‘null’)
wel, did not entirely understand how this came to be as such but did it and now waiting. the error happend a lot before but not so frequently anymore. This amibguity with working with ES/GL is quite frustrating. Things don’t seem to, but probably do, behave consistently.
question is it feasible to build in a trigger so this value can be adjusted on event ?