Pipeline processing json logs

1. Describe your incident:

I have pre-normalized json-formatted logs that I’m trying ingest into Graylog. I have to use the pipeline to process the logs because that’s the only way I can do the right thing based on the pod/namespace/image that generated the log.

The problem is that when I use the standard parse_json(), to_map(), set_fields() pipeline pattern on this message I have exceptions in the log. Further investigation suggests that the reason for those exceptions may be that to_map() is generating incorrect structured data.

More generally though, I’d also like any best practices for how to take any pre-normalized json log message and extract it to the log – but only when conditions from other fields are met.

2. Describe your environment:

  • OS Information: Kubernetes

  • Package Version: graylog/graylog:4.2.3-1

  • Service logs, configurations, and environment variables:

An overview of the logs generated by graylog when set_fields() is called:

$ kubectl logs -l helm.sh/chart=graylog-2.1.2  | grep '^{.*}' | jq -rs '.[] | select(.level == "ERROR") | { message, thrownName: .thrown.name, thrownMessage: .thrown.message  }'
{
  "message": "Caught exception during bulk indexing: java.lang.ClassCastException: Cannot cast java.util.LinkedHashMap to java.lang.String, retrying (attempt #1).",
  "thrownName": null,
  "thrownMessage": null
}
{
  "message": "Couldn't bulk index 65 messages.",
  "thrownName": "java.util.concurrent.ExecutionException",
  "thrownMessage": "java.lang.ClassCastException: Cannot cast java.util.LinkedHashMap to java.lang.String"
}
{
  "message": "Unable to flush message buffer",
  "thrownName": "java.lang.RuntimeException",
  "thrownMessage": "java.util.concurrent.ExecutionException: java.lang.ClassCastException: Cannot cast java.util.LinkedHashMap to java.lang.String"
}
{
  "message": "Caught exception during bulk indexing: java.lang.ClassCastException: Cannot cast java.util.LinkedHashMap to java.lang.String, retrying (attempt #1).",
  "thrownName": null,
  "thrownMessage": null
}
{
  "message": "Couldn't bulk index 15 messages.",
  "thrownName": "java.util.concurrent.ExecutionException",
  "thrownMessage": "java.lang.ClassCastException: Cannot cast java.util.LinkedHashMap to java.lang.String"
}
{
  "message": "Unable to flush message buffer",
  "thrownName": "java.lang.RuntimeException",
  "thrownMessage": "java.util.concurrent.ExecutionException: java.lang.ClassCastException: Cannot cast java.util.LinkedHashMap to java.lang.String"
}

The rule:

rule "extract pre-normalized ECS"
when
    has_field("message")
    AND has_field("container_image_name")
    AND contains(to_string($message.container_image_name), "nginx")
    //AND regex("\"ecs\": \\{ \"version\":", to_string($message.message)).matches == true
    AND contains(to_string($message.message), "\"ecs\":")
then
   set_field("prenormalized", "true");
    let json = parse_json(to_string($message.message));
    let map = to_map(json);
    set_field("prenormalized_debug_json", to_string(json));
    set_field("prenormalized_debug_map", to_string(map));
   // set_fields(map); // if uncommented begets the errors above
end

example of prenormalized_debug_json:

{"timestamp":"2022-05-12T14:35:03+00:00","ecs":{"version":"8.1.0"},"http":{"request":{"bytes":2172,"line":"GET /graylog/api/ HTTP/1.1","method":"GET","referrer":"https://bdb-dev-sci-proxy-1.cisco.com/graylog/system/index_sets/622a335d133685485b0202ea","remote_user":"","time":"0.007"},"response":{"bytes":520,"status_code":200}},"source":{"ip":"10.244.177.0"},"user_agent":{"original":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36"}}

example of prenormalized_debug_map:

{timestamp=2022-05-12T14:35:03+00:00, ecs={version=8.1.0}, http={request={bytes=2172, line=GET /graylog/api/ HTTP/1.1, method=GET, referrer=https://bdb-dev-sci-proxy-1.cisco.com/graylog/system/index_sets/622a335d133685485b0202ea, remote_user=, time=0.007}, response={bytes=520, status_code=200}}, source={ip=10.244.177.0}, user_agent={original=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36}}

3. What steps have you already taken to try and solve the problem?

I’ve narrowed it down to the act of setting the fields and shown that the map being used to set the fields seems to not look right. I suspect the java.util.concurrent.ExecutionException: java.lang.ClassCastException: Cannot cast java.util.LinkedHashMap to java.lang.String is related to the unexpected content seen in prenormalized_debug_map.

4. How can the community help?

Please let me know if I’m doing this correctly or if this is a bug or unimplemented feature in the underlying system. I’d also appreciate any workarounds or other suggestions.

Thank you!

I am not good at tracking down json formatting issues but when I did have to play around in there I found this site very helpful in validating json info.

Also - with a little hunting around, there is this community post here that really gets some more detail about json - in particular in your code above , I don’t think you need to create a map, you can set_fields(json) directly based on what they are saying. Alternatively you can use select_jsonpath() to pull things specifically…

hope that helps…

Thank you!

The prenormalized_debug_map is not valid json, but I also don’t know if it should be. OTOH, prenormalized_debug_json is valid json, and matches the input message, so I’m confident that it works.

As a point of comparison, the same message works properly when handled by an extractor. The problem with the extractor is that I can’t be selective enough to pick when it is extracted and this leads to extractor failures which in turn leads to lost messages.

Another bit of poking around - here is a post that talks about using regex to extract just the json from a message. It may clean up messages that have extra and are causing errors.