JSON Extractor stops messages from showing up in input

Hello,
I have an Input that collects nginx access logs that are sent in the JSON format. I’ve been following the official Graylog guide on how to set up the extractor: How to use a JSON Extractor | Graylog

As it’s been suggested in the guide, I have two extractors: One to parse the message into a json field and one that extracts it.

Here’s an example message before parsing into a proper json field (data changed for privacy):

MyHost nginx: { “timestamp”: “1658474614.043”, “remote_addr”: “x.x.x.x.x”, “body_bytes_sent”: 229221, “request_time”: 0.005, “response_status”: 200, “request”: “GET /foo/bar/1999/09/sth.jpeg HTTP/2.0”, “request_method”: “GET”, “host”: “www…somesite.com”,“upstream_cache_status”: “”,“upstream_addr”: “x.x.x.x.x:xxx”,“http_x_forwarded_for”: “”,“http_referrer”: “https:////www.somesite.com/foo/bar/woo/boo/moo”, “http_user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36”, “http_version”: “HTTP/2.0”, “nginx_access”: true }

And it is successfully extracted into a json field by a regex extractor: nginx:\s+(.*)

{ “timestamp”: “1658474614.043”, “remote_addr”: “x.x.x.x.x”, “body_bytes_sent”: 229221, “request_time”: 0.005, “response_status”: 200, “request”: “GET /foo/bar/1999/09/sth.jpeg HTTP/2.0”, “request_method”: “GET”, “host”: “www…somesite.com”,“upstream_cache_status”: “”,“upstream_addr”: “x.x.x.x.x:xxx”,“http_x_forwarded_for”: “”,“http_referrer”: “https://www.somesite.com/foo/bar/woo/boo/moo”, “http_user_agent”: “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36”, “http_version”: “HTTP/2.0”, “nginx_access”: true }

After that it goes to the second extractor that fails completely. Not only is the preview incorrect (it omits some fields entirely):
remote_addr
x.x.x.x
request
GET /sth.dat HTTP/1.1
response_status
301
upstream_addr
body_bytes_sent
162
http_version
HTTP/1.1
request_method
GET
nginx_access
http_user_agent
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36
request_time
0
upstream_cache_status
host
sth
http_x_forwarded_for
http_referrer
timestamp
1658475023.035

It also keeps missing, it doesn’t extract at all:
obraz

The second Extractor configuration is left default with the exception of “flatten structure” option being turned on.

I kindly request your help and wish you a good day!

I have come to the conclusion that the extractor works properly, however it still keeps missing. I do not know why this happens

Update: As soon as I apply the second extractor, the messages stop coming in.

Hello && Welcome @cesq

We would need to see the configuration made, How your ingesting them, logs (i.e., GL , ES) etc…
Its had to tell what the issue is from the information given.
When you do post any configuration please use the markdown, if you’re unsure take a look here

Hi, sadly I don’t understand what you mean by that. Nor do I understand these terms. Configuration of what exactly would you like to see?

Graylog version is 4.2.10+37fbc90 and it’s running on Red Hat - kernel 4.18

@gsmith I found an error log


It has something to do with the DateTime format

Hello,

Error shown above, Elasticsearch failed to parse the date in the “DateTime” field. You need to convert it and best option probably would be a pipeline

Solution on this post: Failed to index [1] messages. failed to parse field [DateTime] of type [date] in document - #6 by cesq

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.