Debug WARN logs (invalid "timestamp")

Graylog 4.2 logs are full of messages like this:

WARN [GelfCodec] GELF message <787f32c0-2565-11ed-8e61-0ee8466ead25> (received from <host:port>) has invalid "timestamp": 1661535326.0863 (type: STRING)

when I search for “787f32c0-2565-11ed-8e61-0ee8466ead25” I get nothing.

How do I go about finding this message via that “ID”???

how am I supposed to know who is sending a poorly formatted timestamp?

First guess is that it was picked up by Elasticsearch as type keyword (String) when you wanted it to be a long. A simple solution may be to rotate the index so that it will be redefined as numeric on the next ingested message. It seems unlikely but you could create a custom mapping to enforce a numeric (long - for instance).

not sure that has anything to do with my issue. I have a .NET app sending in GELF formatted messages and I am pretty sure the lib used to do the gelf formatting is sending the timestamp as a string instead of a number. I have updated my appender to use <param name="SendTimeStampAsString" value="false"/> and will see if that prevents the warning

Hello,

The timestamp needs to be ISO8601 not Unix.
either convert it before the logs are sent OR convert it with a pipeline/Extractor.

I’m really starting to lose faith in Graylog.

the GL logs are spewing the following warning:

2022-08-29T18:36:55.900Z WARN [GelfCodec] GELF message <8ef9d1c0-27c9-11ed-8e61-0ee8466ead25> (received from some-ip:some-port) has invalid “timestamp”: 1661798215.7943 (type: STRING)

I can only assume that 8ef9d1c0-27c9-11ed-8e61-0ee8466ead25 is some sort of message ID in graylog???

but that message is NOT appearing in graylog

what is going on??? how on earth am I supposed to fix this warning if the message ID isn’t searchable in GL???

I’ve searched dozens of log message IDs and none of them show up in GL. are these logs getting dropped?

EDIT: Re-reading the first post, the error is happening on the input receiving the message? then you can ignore all of the below because that is after the fact (leaving it because I spend some good time writing it) . Perhaps it is an extractor on the Input that is thinking it is a string? It should come in as a number… here is a post that is semi relevant and also has a pipeline rule to change the epoc to something more helpful.

I believe what is going on here is Graylog receives the message and it has a Graylog defined message ID like 8ef9d1c0-27c9-11ed-8e61-0ee8466ead25 but when Graylog goes to store it in Elasticsearch the type is different. So Graylog sees the GELF timestamp as numeric or date or something like that and tries to send it to Elasticsearch as such. Elasticsearch says that field that it has for that is keyword (string) and then likely is rejecting the whole message. Assuming your GELF formatted messages are going to the correct Graylog GELF Input?

When you send in data to ElasticSearch from Graylog, Elasticsearch makes assumptions about the type of data coming in and from what I see it defaults to keyword. The things is it only makes this assumption once on the index when it is first started… so if it it guesses that it is a keyword at index start… it is only that and messages will be rejected if it is not. I think newer versions (that are incompatible with Graylog) care a lot less about this. (you can always go to Opensearch…which is the future of Graylog.)

So how did this happen? Usually it’s because as you are starting to pull in data the fields are off or something wasn’t explicitly defined initially as a numeric so Elasticsearch found/guessed a keyword and is sticking to it until the index is rotated. Maybe your input wasn’t GELF initially or maybe…

What can you do?

  1. Rotate the index by going to System/Indices, clicking on the index you should be receiving the GELF messages on, click on “maintenance” in the upper right and chose “rotate active write index” This will make Graylog tell Elasticsearch to close the current index and start a new one… at which point Elasticsearch will evaluate each field that comes in and “timestamp” should be evaluated correctly. (Be aware of what your rotation/retention strategy is when you do this)

  2. You could create a custom mapping in Elasticsearch… you shouldn’t need to do this though, by default the timestamp should be stored correctly GELF to GELF etc. I did to a post a while about about custom mapping and correctly historical information… here though if you are not getting the message stored, there is not historical information to correct. Still, there are curl commands in there to look at what is being stored in ElasticSearch so you can modify them to your index names and the timestamp field to get more detail on what is going on there.

  3. You didn’t mention anything of this but it is possible that an extractor or pipeline rule could be setting up the fields incorrectly.

GELF is nothing new to Graylog as well as timestamps… it’s just tracking down where the processing isn’t meeting expectations… don’t loose the faith! :smiley:

1 Like

so what’s the point of that message ID field??? how am I supposed to use it to debug anything???

I am getting GELF messages from my hosts… many messages. So that tells me the input is setup correctly. I sure would hope that if my message is getting dropped I would get more than a WARN level message about the format of the timestamp

how can I see the raw message before GL tries to send it to OpenSearch (which by the way, is what I am using)?

Edited above message. Graylog usually shows errors like this on the system/overvew page. It would be better if there were alerts… which is likely possible but a different question.

Per edit - looks like it’s an issue at the input, with a particular host? Maybe with an extractor… are you using them? Or with the host sending in the wrong type? you could use TCPdump/wireshark to see data coming in…

my cluster has been up for weeks and rotation is daily, so it’s rolled indexes many times and this warning accounts for 99% of all graylog generated log messages. Makes debugging real issues impossible.

again I’ll ask, how / where can I use that message ID to see the actual message at the time it hits GL?

again… please ignore the majority of the post where I talked about how things are stored in Elastic.

Your debugging what is hitting the Input…

I don’t use them so I don’t have an example… but you can look here or search for some examples in this forum.

what about this…if graylog is smart enough to log out a WARN message about a specific message ID…HOW CAN I USE THAT MESSAGE ID???

Hello @jasonrberk

I feel your frustration on this issue,

As you can see there are multiple post here in the forum on this issue. As @tmacgbay said, to trouble shoot this issue using Graylog Open it may take a few to find the issue.

The Graylog Enterprise/Operations does have a feature.

https://docs.graylog.org/docs/indexer-and-processing-failures

So yeah, It can be done BUT so long as your under 2GB a day its free.
Example, Marked in the red box, is the node that has failed. So I know what device it is.

I know of a couple ways to solve this and how long it will take depends on how big of an environment you have.

Can I ask what have you done beside rotate your indices?
Since MongoDb holds all the metadata, have you check MongoDb and/or set MongoDb up for debugging? You maybe able to find what source it is , but not 100% sure. My resolve was sending new devices to a Raw/Plaintext INPUT to see if any come though

@jasonrberk

I was looking into this, not sure if it will help but I accessed Mongo.

Steps I took for GL Open 4.3.

Logon to Mongo Shell

root #mongo

> use graylog
> show collections
> db.index_failures.find();

Results:

MongoDB shell version v4.4.16
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("f4fee408-5236-4ece-ab65-353da23cba6b") }
MongoDB server version: 4.4.16
---
The server generated these startup warnings when booting:
        2022-08-19T20:43:44.375-05:00: You are running on a NUMA machine. We suggest launching mongod like this to avoid performance problems: numactl --interleave=all mongod [other options]
---
> use graylog
> show collections
> db.index_failures.find();

{ “_id” : ObjectId(“62ad2ae3777f8d5ed9cbfb1f”), “index” : “linuxserver_11”, “letter_id” : “d26ee42b-eea5-11ec-8588-00155d601d11”, “type” : “indexing”, “message” : “ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [graylog] of type [boolean] in document with id ‘d26ee42b-eea5-11ec-8588-00155d601d11’. Preview of field’s value: ‘{}’]]; nested: ElasticsearchException[Elasticsearch exception [type=json_parse_exception, reason=Current token (START_OBJECT) not of boolean type\n at [Source: (byte)"eption[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [graylog] of type [boolean] in document with id ‘b34daeff-eea5-11ec-8588-00155d601d11’. Preview of field’s value: ‘{}’]]; nested: ElasticsearchException[Elasticsearch exception [type=json_parse_exception\"[truncated 523788 bytes]; line: 1, column: 854]]];]","gl2_source_input":"5e265ada83d72ec570ab5fe2","EventReceivedTime":"2022-06-17 20:27:04","SourceModuleName":"graylog","graylog":{},"gl2_source_node":"8"[truncated 523788 bytes]; line: 1, column: 893]]];”, “timestamp” : ISODate(“2022-06-18T01:27:04Z”) }
Type “it” for more

Maybe give that a try.

1 Like

I’m still super confused. These are WARN messages, not ERROR level. Also, GL hasn’t complained about indexing failures:

this leads me to think the sender is just not sending a well formatted GELF message. GL is getting the message and “dealing with it”, but dropping a warning in the logs in an attempt at being “nice”. Problem is the details in that warning don’t help you actually figure out what’s happening.

I have Java apps and .NET apps sending in GELF messages to the same input and all the java ones get processed w/o issue. Every warning generated is coming from the .NET apps. I did notice the versions of log4net and gelf4net are very out of date…so I’m looking into that.

in the meantime, I still would like more clarification of what actually happened. Like, was the message received by the input and actually indexed. In other word, are the warning messages “paired” with each of my indexed messages?

Hello, @jasonrberk

Perhaps I can simplify this for you.

I understand, BUT if a message is not being indexed because of a timestamp issue then this becomes a “indexing failure”.

Yes I understand this also, but to find the device sending these message that is creating those logs what were doing is suggesting where to look and suggesting on different idea you can execute to solve this issue.

I believe we answer you question already. you have a timestamp issue. This means that you have a device sending log/s your GELF input that is incompatible.

Good question, chances are it may have been dropped. What needs to happen is find the device or device’s that’s is creating this issue. We answered that already from the posts above.

Sorry @jasonrberk We can only offer suggestion. @tmacgbay and I offered suggestion on what and were to look, not much more we can do from here. My apologies. Maybe someone else here has a better idea.

@gsmith why do you think messages were dropped / index failures happened? If that were the case, why is the Admin Console saying:

that’s what I’m confused by.

The messages were dropped at the Input because the incoming message had a string where a numeric was expected. A message has to pass through all of Graylog before it is pushed back out to Elasticsearch and potentially get an index error.

Your messages are not making it in to Graylog so you can’t get an index error like the whole “Ignore that” section was talking about.

check out Graylog in a Nutshell to understand message flow a little better.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.