Inputs are booming, no indexed messages

dear,

this setup is slowly maturing, still, on every change i notice similar issues.

since beats is part of the road ahead i created an input for it and configured beats to send data
in addition a winlogbeat was installed on a laptop which had cached weeks of data due to changes in the network

i noticed graylog indicating burst of up to +3000 messages and expected eventually these would show up in a search ‘all messages’ filter, but not so.

are there typical reasons for an input not indexing messages ?

most obvious concern here is, where did the one million messages go and are they lost ?

Hello,
first of all an input is for ingesting. Graylog provide outputs which are defaults to elasticsearch those are writing messages into the elasticsearch cluster.

So if a message can not writen into the index it appears in the Section “System Overview -> Indexer Failures last 24hours” under “Show Errors” you can see why a message is not written into the index behind a stream.

First you should look there. There is also reason clearly written. Most of the time the reason is a wrong data type for the document field. Those messages are lost.

In your case i fish out my magic classes. I guess at first the section mentioned above had no counters? If this was the case you should just adjust your timerange within the search cause the beat messages could ingest and proccessed with a timestamp correction. That means they are written into the index with log timestamp.

But this is just assumption.

Hey,

Just checked, the indexing errors are from 14 days ago and indeed typically due to mismatches etc. It is not like i’m completely new to elastic and graylog. GL just keeps on beating me with the same problems for which i see no real cause.

I remember this ‘timestamp thingy’ has surfaced before. It is truely cumbersome. My entire setup is set to a single timezone from laptop to server. I don’t see how i could not find events if i select ‘search in all messages’

If you go to your inputs and click “show received messages” on the input in question, is that blank also?

if you click System | Nodes | details, do you see anything being processed through the system? Journal filling up, input buffer, process buffer, output buffer… are any of those doing anything?

Hey, thanks for asking.

AH! output buffer appears to stall at 173 messages and does not move anymore

There was nothing out of the ordinary. I saw messages arriving on the input, messages being processed and so on. The messages stop with timestamp on september 15th. Since i see no new messages while the input counters keep on increasing.

Lately i noticed when i arrive on the inputs page the inputs are briefly shown as ‘not running’ then as ‘running’. I mention this because it may be an indicator i’m not aware, i dont’ remember seeing this before. In the past i submitted issue 9054 on the graylog github for a really annoying bug surfacing when visiting pages under system.

Looking on the graylog server the network traffic is as expected with network activity for the ports configured (input for beats, input for GELF, input for syslog)

Have you checked the Elasticsearch logs?

thanks for asking.

yeah, i checked a lot of logs and saw a lot of errors which i mostly resolved be reconfiguring inputs etc.

The problem has not changed though. A considerable amount of log entries are missing because of some mismatch with the timestamp. I cannot retrieve the message-id extracted from the logs either.

Somehow the new logs arrive just fine while i’ve not reconfigured anything related to timestamps etc.

This timestamp issue should really be better documented. It is deeply annoying how ignorant i feel :smiley:

can you share the logs or no?

to be honest… I’m not sure what we’re troubleshooting here any more… timestamps? input? output?

It seems like you’re having multiple issues, which may be related, but lets work through one at a time.

thanks. well, i am perplexed. Looking at the logs

  1. i learned elasticsearch keeps a separate graylog.log which is flooded with DEBUG messages
  2. graylog also has errors which eventually cause the ssh session to show garbled text.

Most messages here are from a GELF/TCP input with “Null frame delimiter” enabled. Disabling “Null frame delimiter” appears to make the garbled terminal situation go away.

Returning to assessing the situation.

1 - Elastic/graylog.log (chronology respected)

[DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [elk] failed to put mappings on indices [[[graylog_2/rR5AtZe6RA2S74QEFNxdQQ]]], type [message]
[DEBUG][o.e.a.b.TransportShardBulkAction] [elk] [graylog_2][0] failed to execute bulk item (index) index {[graylog_deflector][message][8c6066c0-079d-…], source[n/a, actual length: [2kb], max length: 2kb]}

2 - Graylog/server.log (a two hour ofset due to an unreachable time server on the client, fixed now)
2020-10-06T08:42:11.025+02:00 ERROR [DecodingProcessor] Unable to decode raw message RawMessage{id=0fb5a892-079f-11eb-8eba-1264c5baeb62, journalOffset=62143755, codec=gelf, payloadSize=368, timestamp=2020-10-06T06:42:10.969Z, remoteAddress=/10.9.1.6:5441} on input <5ef105d184645059c4026896>.
2020-10-06T08:42:11.027+02:00 ERROR [DecodingProcessor] Error processing message RawMessage{id=0fb5a892-079f-11eb-8eba-1264c5baeb62, journalOffset=62143755, codec=gelf, payloadSize=368, timestamp=2020-10-06T06:42:10.969Z, remoteAddress=/10.9.1.6:5441}

com.fasterxml.jackson.core.JsonParseException: Unexpected character (’;’ (code 59)): expected a valid value (number, String, array, object, ‘true’, ‘false’ or ‘null’)

I hope this helps to further the discussion.

to date i’ve not been able to resolve the actually issue(s)

Probably you try to send wrong formated messages for GELF Input. Gelf format is very strict:

afaik, i only point nxlog to it which sends GELF
i’ll check

is this an indicator of such ?

Timestamp Index Letter ID Error message
a day ago graylog_3 fbc05463-12ad-11eb-9175-7e053dcd3d04 {“type”:“illegal_argument_exception”,“reason”:“Limit of total fields [1000] in index [graylog_3] has been exceeded”}

Check this:
https://www.graylog.org/post/what-to-do-when-you-have-1000-fields

wel, did not entirely understand how this came to be as such but did it and now waiting. the error happend a lot before but not so frequently anymore. This amibguity with working with ES/GL is quite frustrating. Things don’t seem to, but probably do, behave consistently.

question is it feasible to build in a trigger so this value can be adjusted on event ?

Best practise to avoid this is not to use one big index, but small indexes for same type of input from devices (same fields) with own retensions…

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.