Problem understanding the use of is_json function

Hi Guys,

im trying to understand if can utilize “is_json” for differ between Json and not Json Content in Message field in Pipeline Rules. From my understanding i could use something like this:

rule “Test for JSON”
when
is_json($message.message) == true
then
set_field(“may_contain_json”, true);
end

My Input Source is Raw AMPQ and the content of Message field is:

{“@timestamp”:“2023-05-17T12:50:17.457Z”,“priority”:“5”,“severity”:“notice”,“tags”:[“logstash”,“udp”,“_grokfix_sysloginput”],“message”:“May 17 14:50:17 se: lala\u0000”,“shipper”:“def”,“@version”:“1”,“facility_label”:“kernel”,“host”:“1.1.1.1”,“hostname”:“abc”}

But to me, it does not work like iam expecting it.

Hey @hasturo
You wanted something like this?

rule "extract-json"
when
   is_json(parse_json(to_string($message.message))) == true
then
   set_field("original_message", to_string($message.message));
   let json = parse_json(to_string($message.message));
   let map = to_map(json);
   set_fields(map);
   set_field("DEBUG", true);
end

Hi,

this is what im looking for. The problem here is that youre expecting json data. But whats with data thats not JSON? On my installation it throws Pipeline Errors because parse_json requires json content.

2023-05-30 12:23:42,646 WARN : org.graylog.plugins.pipelineprocessor.functions.json.JsonParse - Unable to parse JSON
com.fasterxml.jackson.core.JsonParseException: Unrecognized token ‘connection’: was expecting (‘true’, ‘false’ or ‘null’)
at [Source: (String)“connection from abc () at Sun Jul 10 13:17:22 2005”; line: 1, column: 11]

I can disable Warnings, but to me, this should be not the right way. Im trying to improve and simplify my Ruleset in General. I know that in can use a Regex to seperate my data but then im wondering why is_json cannot do this and whats is_json is for when i already has to sure that i push json data into it.

Best Regards,

Hey

As for

I have run into this situation, our resolve was depending on the device and/or same devices go to a specific input just for them, Switches. Firewall, Apache, etc… you get the hint.
Not only did this resolve you issue but it also resolves another issue called “ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Limit of total fields [1000] has been exceeded]]”

Right Input, for the right type of logs.

Let’s say your shipping all types of logs from one device to one Input. Depending on the shipper you can place tags and sort it out into different stream/s, Hence place you pipe line on the stream with only jsonformated logs/messages.

As for switch and/or Firewalls they should on have one type of log being shipped.

personall I would ensure that device send logs is the correct type foryour pipeline. Takes a little extra work but pays off in the long run.

Hi,

thank you for your expertise! It would be easier for me if i can seperate more Data per Input. I agree with you that it is one of the more important things which makes a Graylog Life easier.

In my Case we had only Syslog for all kind of Logs, “because”. :slight_smile:. Working on getting Data better upfront is a constant process here. Also we use Logstash to collect all Logs and ship them with RabbitMQ to our Graylog. But thats something for another Topic.

Ive also the Problem with the Field Limit but not in General, more in one or another Indices. It would be great if Graylog Supports to change those Limits.

Back to the Topic. Can you give me a Hint for what is “is_json” for when you need to Parse Json before?

/hasturo

Hey @hasturo

it states “Checks whether the given value is a parsed JSON tree”.

.

https://go2docs.graylog.org/5-0/making_sense_of_your_log_data/functions_descriptions.html