Pipelines, JSON, array for tags

Hello,

I’m trying to replicate a Beat input behavior for tags with Pipelines and “flatten_json”

In Filebeat I declare a file like this with tags :

- type: log
  paths:
    - "/var/log/haproxy.log"
  tags: ["haproxy", "access"]

In Graylog with a Beat input,

Then when I search a message i have a field like this :

filebeat_tags
    ["haproxy","access"]

When I click on the value and I choose “Add to query”, this is what I see in the search field :
filebeat_tags:haproxy,access

Which is great, because in can search one of the value, or several by adding more AND filebeat_tag:<tag>

In Graylog with a Kafka RAW input,

This is the JSON eaten by Graylog :
{"@timestamp":"2023-02-06T15:15:00.410Z","@metadata":{"beat":"filebeat","type":"_doc","version":"8.6.1"},"log":{"offset":2965630,"file":{"path":"/var/log/haproxy.log"}},"message":"Feb 6 16:14:59 localhost haproxy[81865]: 178.237.98.45:65088 [06/Feb/2023:16:14:59.941] www~ graylogs/pirv-siem-es-graylog-03 0/0/1/9/11 200 620 - - ---- 4/4/1/0/0 0/0 \"GET /api/system/cluster/nodes HTTP/1.1\"","tags":["haproxy","access"],"input":{"type":"log"},"host":{"name":"pirv-siem-lb.services.com"},"agent":{"ephemeral_id":"34e1af49-e08c-4c64-a6aa-a25db055d715","id":"06fba243-d033-450e-8c43-e9e55cb43669","name":"pirv-siem-lb.services.com","type":"filebeat","version":"8.6.1"},"ecs":{"version":"8.0.0"}}

My Pipeline to convert it :

rule "Simple Flatten Json Rule"
when
    true
then
    let sJson = to_string($message.message);
    let fJson = flatten_json(value: sJson, array_handler: "json", stringify: false);  
    set_fields(to_map(fJson));
end

Then when I search a message I see a field for my tags , who looked the same as the other :

tags
    ["haproxy","access"]

BUT When I click on the value and I choose “Add to query”, this is what I see in the search field :
tags:\[\"haproxy\",\"access\"\]

The behavior is not the same as the input BEAT, now I have a lot of "" and it’s not really usable.

Any idea why this is happening ?
or How to get a array of value from my JSON to get a usable tag list ?

Thanks.

Hey @benoitp

Have you tried a different INPUT then Beats? It seam like you sending JSON logs and using a Pipe to flatten it, if so perhaps try “Raw/Plaintext”. Just an idea.

when using Pipeline to custum under “System/Configurations” → Message Processors Configuration section to have Pipeline Processor after Message filter chain.

Hello @gsmith

Yes I tried the “Raw/Plaintext Kafka”, that’s the input I would like to use.

My configuration for Message Processors is as follow :

# Processor Status
1 AWS Instance Name Lookup active
2 GeoIP Resolver active
3 Message Filter Chain active
4 Pipeline Processor active
5 Illuminate Processor active

Did you know the right way to build a tag field ?

Hello @benoitp

Your using Filebeat. Correct me if im wrong, filebeat currently only sends JSON documents.

But the Pipeline shows, and again correct me if im wrong you trying to get the Message field and flatten it out when I believe you are already sending it in json format. In this case I would suggest when useing Filebeat then use the Beat input.

Here is a good Example:

Hello,

I cannot use a Beat intput because I’m using a Kafka in front of my Graylogs.

That’s why I need to work with a Kafka RAW input.

I’m glad to announce that I succeed.

After flatten the JSON, the key is to split the “tags” fields, and write it without any change, like this :

rule "System - format tags"
when
  true
then
  let fields = split(", ", to_string($message.tags));
 
  set_field("new_tags", fields);
end

After that you will have a field with a array and you will be able to query on it like this :
new_tags=my_first_tag AND new_tags=my_2nd_tag AND new_tags=... AND message:...

2 Likes

Understood, Glad you resolved it :+1:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.