Specifics:
- OS = Debian 12
- Graylog = 7.0.3
- Opensearch = 2.19.4
- mongodb = 8.0.18
Flow:
zeek (JSON output) → syslog-ng → input → stream → pipeline → rule stages
Notes:
- There is only one input extractor which uses regexp replace to add a field of “is_json” with a value of “json” so that the rule on the pipeline knows to process ‘this’ message.
- It appears that once a JSON conversion is done (tried both input extractor and pipeline rule) - in both cases, all of the JSON converted fields are non-modifiable. (eg: in a pipeline rule a remove_single_field(“<field_name>”); does NOT work. Field is not removed, nor is there any error in the graylog log.
- What’s key is four fields need to be extracted and placed into fields for the bulk of process: source IP, source port, destination IP, destination port.
- structure in $message.message: “ {}”. Which should be very easy to convert as it’s purely key/value pairs, flat.
Performing the JSON processing (from message to fields) in the extractor phase or as a pipeline rule - appears to make no difference in terms of this issue. Combinations tried:
- JSON conv in extractor, add fields in extractor
- JSON conv in extractor, add fields in pipeline
- JSON conv in pipeline, add fields in pipeline
The conversion to JSON [itself] obviously works as deleting the extractor or removing the pipeline rule leaves only the “ ” in the message field and no [JSON] fields. Enable either the extractor or the pipeline rule and all the fields appear on every message, but…. you cannot do ANYTHING with those fields. Have tried running various tests is_null, is_not_null, is_string, etc. just to see what the backend “thinks” about it (is_string and is_not_null come back as true, is_null returns false). The results are what one would expect, but cannot actually read the data. You can see the field and data value when viewing messages in the stream, but when you do debug(concat("<fieldname>: ", to_string($message.<fieldname>))); (this is done in a later stage rule to ensure that the conversion was done in a prior stage) - you get NOTHING after “:” in the server log. As a result, all of the JSON fields are also non-searchable via the search/filter in the stream view. It’s as though they are present and nothing more - which means there’s very little value in terms of analytics.
Interestingly, the field “is_json” mentioned earlier, the value can be read and the field deleted without issue (neither function works on any field created through any means of conversion from JSON to fields). “is_json” being a means to identify which messages need to be parsed and gets removed (cleanup) in a later stage rule. Premise was to minimize any extraneous data from being stored and has also become validation that the later rule is ‘firing’ - the field “is_json” is not present when remove_single_field(“is_json”) is used in the later stage rule.
At this juncture, not sure what to think about it and after spending a good number of hours trying different things to analyze and get it working - need some fresh eyes/ideas on this.
Conversion to JSON pipeline rule contents (based on a posting from these forums):
rule “JSON Parser”
when
has_field(“is_json”) AND
(to_string($message.is_json)) == “json”
then
let prepjson = regex_replace(“^\\S* “,to_string($message.message),””);
let the_json = parse_json(to_string(prepjson));
let the_map = to_map(the_json);
set_fields(the_map);
end
Have also tried the above without the second “to_string” on the “let the_json” line. Just in case something was having heartburn with to_string→to_string. (made no difference)
Temporary work-around was to revert everything to be extractor based and creating regex extractors to begin to work with a small fraction of the data. The problem is significant variance in the JSON structure as its 10-20 different logs from one source. The goal was to be able to analyze certain types of events through “_exists_:” + “:” statements to limit which items are retrieved - knowing which field is present for a given type of element.
Any ideas would be greatly appreciated.
Thanks!