Hi!
I have graylog 4.2.1 and nginx 1.21.4 with filebeat 7.15.2 and beats input.
Nginx logs information about request in a logfile in JSON format like this:
{ "time_iso8601": "2021-11-20T09:58:22+00:00", "msec": "1637402302.360", "connection": "106855", "connection_requests": "1", "pid": "18354", "request_id": "6f930f70f80c9cf8d0a6015bd42f6930", "request_length": "525", "remote_addr": "35.235.111.111", "remote_user": "", "remote_port": "43414", "time_local": "20/Nov/2021:09:58:22 +0000", "request": "GET /2013/08/06/?lang=en HTTP/1.1", "request_uri": "/2013/08/06/?lang=en", "args": "lang=en", "status": "301", "body_bytes_sent": "5", "bytes_sent": "399", "http_referer": "", "http_user_agent": "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)", "http_x_forwarded_for": "51.222.253.8", "http_host": "example.com", "server_name": "example.com", "request_time": "0.313", "upstream": "127.0.0.1:9001", "upstream_connect_time": "0.001", "upstream_header_time": "0.314", "upstream_response_time": "0.314", "upstream_response_length": "22", "upstream_cache_status": "", "ssl_protocol": "TLSv1.2", "ssl_cipher": "ECDHE-RSA-AES128-GCM-SHA256", "scheme": "https", "request_method": "GET", "server_protocol": "HTTP/1.1", "pipe": ".", "gzip_ratio": "", "http_cf_ray": "", "request_completion": "OK"}
Filebeat is configured in this way:
filebeat.inputs:
- input_type: log
paths:
- /var/log/nginx/json.log
fields:
logtype: nginx-access-json
fields_under_root: true
I am able to receive logs and to parse them to fields using JSON extractor:
{
"title": "Extract JSON fields",
"extractor_type": "json",
"converters": [],
"order": 1,
"cursor_strategy": "copy",
"source_field": "message",
"target_field": "",
"extractor_config": {
"flatten": true,
"list_separator": ", ",
"kv_separator": "=",
"key_prefix": "",
"key_separator": "_",
"replace_key_whitespace": false,
"key_whitespace_replacement": "_"
},
"condition_type": "none",
"condition_value": ""
},
My problem is that Graylog uses time from “filebeat_@timestamp” as “timestamp”, this means that I can see the situation when logs were actually received by Graylog, but I am really interested in being able to analyze the situation on the origin server, when every request was actually executed (as logs may arrive in batches with some delay).
In ES + logstash + kibana I used this type of logstash config to update timestamp:
filter {
if [logtype] == "nginx-access-custom" {
[..]
date {
match => [ "time_local" , "dd/MMM/YYYY:HH:mm:ss Z" ]
target => "@timestamp"
}
[..]
}
}
I read many different topics here and on strackoverflow about a proper way to implement something like this in Graylog via extractors or pipelines, but all of them failed either silently or with “gl2_processing_error”.
The most clean and neat way I can see is to add one more extractor to copy “time_local” field with a proper date format to “timestamp”. First of all I check this approach by trying to copy this to “timestamp2” field:
{
"title": "timestamp from JSON time_local",
"extractor_type": "regex",
"converters": [
{
"type": "date",
"config": {
"date_format": "dd/MMM/yyyy:HH:mm:ss Z"
}
}
],
"order": 2,
"cursor_strategy": "copy",
"source_field": "time_local",
"target_field": "timestamp2",
"extractor_config": {
"regex_value": "^(.*)$"
},
"condition_type": "none",
"condition_value": ""
}
And it works in a prefect way - I can see a new “timestamp2” field with a correct date from the logs:
But when I update this extractor to use “timestamp” field as a target I get “gl2_processing_error”:
P.S. I’m not yet familiar with pipelines, so I want to achieve the result with extractors if this is possible.