Graylog metrics plugin feeding data via GELF to Graylog causing parsing errors

Hi,

some background on the setup:

  • Graylog 3.3.2 with installed metrics-reporter-gelf plugin (version 3.0.0)
  • Plugin output is configured to send data as GELF to localhost on port 10001
  • Graylog is configured with a corresponding GELF TCP input
  • Messages are not modified except being routed to a stream and in a separate index set (short retention)

Most of it is working fine, and we can query the metrics.
With every push of new metrics we see a few “Failed to index message” warnings:

2020-07-14T17:45:13.131+02:00 WARN  [Messages] Failed to index message: index=<graylog_metrics_13> id=<015f5c27-c5e9-11ea-af20-005056904875> error=<{"type":"mapper_parsing_exception","reason":"failed to parse field [value] of type [long] in document with id '015f5c27-c5e9-11ea-af20-005056904875'","caused_by":{"type":"illegal_argument_exception","reason":"For input string: \"Tue Jul 14 17:43:52 CEST 2020\""}}>
2020-07-14T17:45:13.131+02:00 WARN  [Messages] Failed to index message: index=<graylog_metrics_13> id=<015c7606-c5e9-11ea-af20-005056904875> error=<{"type":"mapper_parsing_exception","reason":"failed to parse field [value] of type [long] in document with id '015c7606-c5e9-11ea-af20-005056904875'","caused_by":{"type":"illegal_argument_exception","reason":"For input string: \"[]\""}}>

I am a bit at a loss as I can not figure out the cause for those warnings.
The mismatch of type long with the content shown in the warning is to be expected, but I can not find the data of the wrong type anywhere in the data that is being sent to Graylog.

I sniffed the GELF traffic and can not find a field called value with a timestamp.
Some excerpt from the traffic I saw:

{
  "version": "1.1",
  "timestamp": 1.59474118289E9,
  "host": "metrics",
  "short_message": "name=org.graylog2.shared.buffers.ProcessBuffer.decodeTime type=TIMER",
  "level": 6,
  "_mean_rate": 493.79032918602144,
  "_m1": 449.8989474964472,
  "_max": 23.452638,
  "_count": 1105029,
  "_m5": 467.7861161082778,
  "_rate_unit": "second",
  "_type": "TIMER",
  "_p95": 0.543602,
  "_duration_unit": "milliseconds",
  "_p98": 0.650645,
  "_p75": 0.426937,
  "_m15": 441.9039458482535,
  "_p99": 0.795899,
  "_min": 0.017183,
  "_median": 0.333787,
  "_mean": 0.3052500589561977,
  "_name": "org.graylog2.shared.buffers.ProcessBuffer.decodeTime",
  "_p999": 4.51121,
  "_stddev": 0.7182941066091177
}

The data contains a timestamp field, though.
Any idea what is causing this and how I might fix this?

Thanks!

feeding graylog data back into the same graylog will/might kill that environment quickly.

Once you have problems/high load you maximise the problems you might see …

But for the error - you have a field (value) that is create as type long (because a long was the first ingested data for this field after index creation) and you try to ingest other data into that.

You can go create a custom mapping to FORCE a specific datatype on a field or write a processing pipeline to rename the field based on the content.

Thanks!

feeding graylog data back into the same graylog will/might kill that environment quickly.
Once you have problems/high load you maximise the problems you might see …

I know, already working on a proper solution but we needed to look into some issues right now :wink:

But for the error - you have a field (value) that is create as type long (because a long was the first ingested data for this field after index creation) and you try to ingest other data into that.

That is what I figured but I was unable to find the cause.
Seems like I somehow sampled every single message except the two types that cause issues :see_no_evil: :man_shrugging:

I checked out whether to force the type to text but that is troublesome as it prohibits me from doing proper statistics on the values.
In the end I went with a pipeline that just renames the field for the offending metrics which is good enough until we switch to feeding the metrics into another system

For completenes sake:

Rule "Cleanup: Rename oldest-segment metrics value field"
when
    has_field("name") AND
    has_field("value") AND
    to_string($message.name) == "org.graylog2.journal.oldest-segment"
then
    rename_field(
        old_field: "value",
        new_field: "value_string"
    );
end

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.