How to identify the source of data causing mapper_parsing_exception


(123dev) #1

Hi,

Our Applications send out data with ErrorCode field.
We have a custom mapping defined for ErrorCode field to be of type long (the correct type for our application).

Every now and then we see errors in Graylog like the following

an hour ago	graylog_640	76f3a310-22df-11e8-92d7-0666f8e8d5df	{"type":"mapper_parsing_exception","reason":"failed to parse [ErrorCode]","caused_by":{"type":"number_format_exception","reason":"For input string: \"fullyCutByExtractor\""}}

We strongly suspect that one of the other sources (probably nxlog that gathers windows eventlogs , IIS logs, SQL logs) is sending data that includes ErrorCode field with string content hence causing the issue.

  1. How can we easily identify the source, because of this error, the data is not in ES
  2. Once identified, we plan on renaming the field to avoid the conflict, can this be done in Graylog or is it already too late to do this in drool or pipeline? and we need to fix it at the source?

Thanks


(Jochen) #2

This hints to some extractor in your Graylog cluster which “cuts” the contents of the “ErrorCode” field and replaces it with this string.

If you want to avoid this, you can either replace the complete extractor with a pipeline rule or use a pipeline rule to remove that particular value from the “ErrorCode” field.


(123dev) #3

Thanks Jochen for a quick response.

Now that you mention that, I remembered that a while back I had added an extractor to hopefully move / rename ErrorCode field coming in from NxLog Input.

In an attempt to quickly find all such cases in many extractors that we have on multiple inputs I chose to create a content pack, and selected all inputs.
Then searched in the exported json for the word cut, and only found one occurrence of this (in NxLog Input)

{
	"title": "Move ErrorCode to NxLog_ErrorCode",
	"type": "COPY_INPUT",
	"cursor_strategy": "CUT",
	"target_field": "NxLog_ErrorCode",
	"source_field": "ErrorCode",
	"configuration": {},
	"converters": [],
	"condition_type": "NONE",
	"condition_value": "",
	"order": 2
}

And here is the UI of that extractor

I don’t see how this extractor could cause that issue.
Even if this was being run, it’s a simple copy / cut rule.
Could there be another extractor that somehow did not make it through content pack export?
Besides, isn’t this extractor meaningless? the error should be reported before it being written to ES as the type is mismatched, and hence the extractor will never have the chance to run (copy field).

Thanks


(Jochen) #4

It copies the contents of the “ErrorCode” field to “NxLog_ErrorCode” and replaces the original content with “fullyCutByExtractor”.

That’s up to you.

Graylog doesn’t validate the type of each field in each message it sends to Elasticsearch.


(123dev) #5

Thanks Jochen,
So cut is not a true cut, it replaces with that string "fullyCutByExtractor"
Curious as to why isn’t it a true cut? meaning get rid of the original field?
Otherwise the way it is now, it will only work for fields that are text.

I’ll get rid of the extractor, which will solve my immediate issue.
Though I’m curious to hear about

  1. The Cut question
  2. Had this not been caused by this extractor, and truly by some data source sending conflicting type, how could I easily find that source?

Thanks


(Jochen) #6

This is as much legacy as it can get. I think the intention was to show something in the “cut” field so that users don’t wonder that their field was missing. :man_shrugging:

Please create a bug report for it at https://github.com/Graylog2/graylog2-server/issues.

I’m pretty sure it is caused by that extractor. Other than that, you’d have to check every input and every pipeline rule in your Graylog cluster.


(123dev) #7

Thanks Jochen,
Yeah, I get it, in this case it is definitely caused by the Extractor
But here is a scenario that would cause similar issues.

With custom mapping set to type long
Any data source that sends data with the same field name (ErrorCode in this case) of type that is not long, will cause ES to not add the record and complain about data type mismatch.
In a scenario as such, no amount of pipeline / extractor investigation would reveal the source.
And considering that we have many input and lots of sources, is there anything (debug or otherwise) that can be done to isolate the culprit, short or turning off each source to see if the errors stop (which is not practical if the issue is sporadic)

Thanks


(Timothy Wall) #8

It’d probably be worth noting somewhere prominent in the docs (or in a FAQ related to this question) which are the “magic” fields which will cause indexing failures if used with an improper type.

I added a %{LOGLEVEL:level} grok pattern, which caused mysterious (to me) index errors until I saw this thread. “level” is expected to be a numeric value by elasticsearch (at least in my setup). Maybe I inadvertently “initialized” that field to be numeric somewhere else? I’m not aware of doing so.


(Jan Doberstein) #9

there are not magic fields - the decision what field should contain be what type is made on the field creation in that index.

It could happen that you have a field that is a string and after index rotation this field is a number, just because the first value that was ingested was a number. That is the reason for having a fixed elasticsearch mapping.


(Timothy Wall) #10

I only call them “magic” because it was not something I intentionally set anywhere. It’s most likely that the default syslog processing includes that field (that was the first input I hooked up), and that’s what established the type as a number.

Now that I’m a bit more familiar with the system, I was able to resolve the indexing error, but for several days it was just mysterious. I had thousands of indexing errors like this:

a day ago	graylog_0	bb2eb240-2dcf-11e8-9f92-0e6f1cd17d6c	{"type":"mapper_parsing_exception","reason":"failed to parse [level]","caused_by":{"type":"number_format_exception","reason":"For input string: \"error\""}}

It turns out that I had added a “grok” rule which parsed the level out as a string named “level”, rather than the existing apache convention of calling it “loglevel”. The error would have been immediately obvious if I had had even a fragment of the dead message (or its source).


(system) #11

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.