Parsing nested json message in field with parent object in pipeline

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:
I have set up pipeline to extract nested json and parse message to fields … it seems parsing works fine but we want something like prefixing its object with field name…
for example I have below json:
{"@timestamp": "2022-10-25T17:55:29+00:00", "source": "mr-hub-nginx", "nginx": {"remote_addr": "xx.xx.101.216, xx.xx.146.165", "remote_user": "39942", "body_bytes_sent": 0, "request_length": 786, "request_time": 0.514, "status": 202, "request": "PATCH /test/v1/enablement/19299 HTTP/1.0", "request_method": "PATCH", "http_origin": "-", "http_referrer": "-", "site": "mr-hub-kube.test.com", "port": 443, "http_user_agent": "python-requests/2.28.1" }}

We got logs with filed name “nginx,port,request,…” however, we want to have field name prefix by its object “nginx” like “nginx_port, nginx_request”

In short, whatever object we have should pick dynamically and prefix to field… is that possible ?

2. Describe your environment:

  • OS Information: Ubuntu 20

  • Package Version: 4.3.3+86369d3, codename Noir

  • Service logs, configurations, and environment variables:
    I have pipeline set up:
    Stage0: extract json

rule "extract json"
when 
    regex("(\\{.*\\})", to_string($message.message)).matches == true
then
    let json = regex("(\\{.*\\})", to_string($message.message), ["json"])["json"];
    set_field("json", json);
end

Stage1: parse json

rule "parse json"
when
  has_field("json")
then
  // the following date format assumes there's no time zone in the string
 let json_props = parse_json(to_string($message.json));
 set_fields(to_map(json_props));
 
 let nginx_json = select_jsonpath(json_props, {nginx: "$.nginx"});
 let nginx_props = parse_json(to_string(nginx_json.nginx));
 set_fields(to_map(nginx_props));
end

Hello @brijesh.kalavadia

Correct me if I’m wrong but you want to rename the fields that have been created already? If so have you try creating Stage 2 new rule and using set_fields for the naming convention?

the set_fields() function allows you to set the prefix for all fields it is working on:

https://docs.graylog.org/docs/functions-1#set_fields

1 Like

Hi @gsmith ,
Thanks for suggestion. I tried with set_fields and it worked on above example… i just added prefix in "set_fields(to_map(nginx_props),“nginx_”)

but its fixing prefix which i know… however i want to add it dynamically… is there a way i can put some different logic to pick object and then set its prefix based on that value ?

sorry i am asking basic things as i am new to graylog and still learning things… if you can provide some example of that would be great

Hey @brijesh.kalavadia

For dynamically, I assume you referring to not having a pipeline and just ingesting the logs and “watch the magic work:slight_smile: , If so perhaps a new index and create a new index template for those types of logs.

Using pipeline only… I want to some logic to place where i have used “nginx” static value instead it picks dynamic value from json nested message.

hey,

Only thing I can think of is using regex and/or a lookup table. Within the pipeline you can use the lookup table/s.

Have you experimented with flatten_json?
Here’s a source and example with something I did to bring in logs into my test system.

rule "Random User Data Flatten Json Rule"
// From sample data : https://randomuser.me/api/
// Api input path: *
when
    true
then
    let sJson = to_string($message.result);
    let sJson = regex_replace(
        pattern: "^\\[|\\]$",
        value: sJson,
        replacement: ""
        );
    let rsJson = flatten_json(to_string(sJson), "flatten");
    set_fields(to_map(rsJson));
    remove_field("result");
    set_field("message", "parsed user data");
end
3 Likes

Thanks @jivepig … I tried with flatten_json and it works…

1 Like

Awesome! Do you have a sample rule that you used and what the output looked like? I’d love to see it.

Single pipeline rule was good enough for us to get nested json field.

rule "extract json"
when 
    regex("(\\{.*\\})", to_string($message.message)).matches == true
then
   let json = regex("(\\{.*\\})", to_string($message.message), ["json"])["json"];
  // set_field("json", json);

set_fields(to_map(flatten_json(value: to_string(json), array_handler: "json")));

2 Likes

it seems every field I am getting is type of string now… not sure why… it should be what field type itself… any suggestion ?

There are some fields we have which are not string and now it got converted to string.

I think you will have to then set fields with their type after with something similar. When using flatten_json, it will not set the field types at this time. You will need to:
set_field(“fieldname”, to_type($messaage.fieldname));

I am bit confuse here… will yo be able to help to correct in my below rule what you are suggesting.

rule "extract json"
when 
    regex("(\\{.*\\})", to_string($message.message)).matches == true
then
   let json = regex("(\\{.*\\})", to_string($message.message), ["json"])["json"];
  // set_field("json", json);

set_fields(to_map(flatten_json(value: to_string(json), array_handler: "json")));

There are no to_type function

can you paste in your raw log results? Replace names or IP’s or content with what you want. But let me see it for setting the fields?

Below is raw log:

router-84d84bccc-rl8gk nginx: {"@timestamp": "2022-11-03T20:39:07+00:00", "source": "router", "nginx": {"remote_addr": "xx.xx.12.123", "remote_user": "39942", "body_bytes_sent": 0, "request_length": 656, "request_time": 0.464, "status": 202, "request": "PATCH /xxxxxx/emapi/v1/enablement/53815 HTTP/1.1", "request_method": "PATCH", "http_origin": "-", "http_referrer": "-", "site": "xxxxx.com", "port": 443, "http_user_agent": "python-requests/2.28.1" }}

and as a result after pipeline runs: We can see parsing is fine and it parse nested json however, every fields converted to type of string. however its not for all… for example… “nginx_port” “nginx_request_length” these are not string fields…
Also it looks like… flatten_json only changing nested json field data type.

I am opening an issue to investigate this a little bit further. How nested fields get parsed with flatten_json is, there could be many different fields and types under that nested blob. The flatten_json will parse all as strings. Performing set_fields would need to be done on the fields requiring changes to non string type. I’ll open the issue just to make sure this is the case.To investigate:

Thanks @jivepig for taking this further… just wondering is there a way i can track that issue and will there be any timeline which i can convey to our internal team ?
Also any other workaround I can apply to have nested json works fine through pipeline ?

1 Like