Pipeline rule to extract key-value pair not working

Hello, Graylog community,

After doing a lot of research and testing, I decided to come here and ask if anyone could help me to shed some light on a particular situation.

I’m trying to configure a pipeline rule to extract some fields inside an original field called “message”, but for a reason, but for some unknown reason, this pipeline rule was not able to perform this simple task.

Let me explain better, I’m extracting information from a database and using Graylog sidecar/filebeat to collect these events, and the field message, contain some key-value pairs (as shown in the image down below) that contain useful information for my later searchable purposes.

I’m using the following pipeline rule, I found this code in many posts here and in other blogs, the most awkward for me it’s the simulation works as you can see in the following picture, but when I applied the rule and generate some logs/events the fields are not extracted.

rule "Safetica_Message_Fields"
when
has_field("message")
then
set_fields(
        fields:
            key_value(
            value: to_string($message.message),
            trim_value_chars: "",
            trim_key_chars:"",
            delimiters:"|",
            kv_delimiters:"="
    )
);
end

I feel that I’m doing something wrong, but I revisited all the configuration steps as the documentation says and as other blogs or tutorials say either, but I really cannot see what I’m doing wrong, if it’s possible, I would like your opinion about that.

Just one more observation, I changed the order of Message Processors Configuration but for me seems to not cause any effect, the next image shows the last configuration order I applied.

I appreciate in advance any comments.

Ps: before trying to use pipelines, I tried to use extractors with regexes, key=value extractors, and grok patterns extractors, none worked, the pipeline rule was the closest I came to achieving the correct field extraction, but if you guys have a different thought, please, share with me, I will be more than glad to test.

Looking quickly at your rule it looks like there are two types of quotes in there… only one of them works, the other screws things up. you can use the forum tool </> to post your code to make sure it comes accross in a more readable manner. For example, your rule without </>:

rule “Safetica_Message_Fields”

when
has_field(“message”)
then
set_fields(
fields:
key_value(
value: to_string($message.message),
trim_value_chars: “”,
trim_key_chars:"",
delimiters:"|",
kv_delimiters:"="
));
end

and now with the </> applied (And some indentation)

rule “Safetica_Message_Fields”

when
    has_field(“message”)
then
    set_fields(
           fields:
                 key_value(
                       value:                 to_string($message.message),
                       trim_value_chars:      “”,
                       trim_key_chars:        "",
                       delimiters:            "|",
                       kv_delimiters:         "="
                )
     );
end

It becomes a little more obvious that you have a couple of in there when you only want " to be in there. Of course this may be an artifact of the direct paste or somesuch but that is the first thing I spotted. Also of note, the Graylog Editor wouldn’t allow it.

For a little more clarity you could break it out to the commands and use the debug() function to find out what is going on in your rule. So you could do something like this (with corrected quotes…:stuck_out_tongue: )

rule "Safetica_Message_Fields"

when
    has_field("message")
then

    let THOR_GOT =  key_value(
                       value:                 to_string($message.message),
                       trim_value_chars:      "",
                       trim_key_chars:        "",
                       delimiters:            "|",
                       kv_delimiters:         "="
                );
    set_fields( fields: THOR_GOT);
    //
    // Find debug output with:   tail -f /var/log/graylog-server/server.log 
    debug(concat("+++++++ KV_Fields are:", to_string(THOR_GOT)));
end
1 Like

@lmattos90 Can you share the full set of options you’re using in the Pipeline Simulator please? As well as any Stream Rules or Pipeline Rules configured to operate on the Streams this message is supposed to pass through?

Also, can you post the full raw log message? Exactly as it would appear coming from the log source. Best way to do this is to run a tcpdump with the -A flag on the interface and port Graylog is receiving log messages on.

1 Like

I do apologize for posting in the wrong way, I corrected the way you have explained it, thank you very much for being so patient and kind.

I have tried before using the debug, but the way you have explained seems to me that I put the instruction in the wrong code section, I will try it right now.

Thank you very much.

Sure @william, and thank you for so quickly interacting.

Can you share the full set of options you’re using in the Pipeline Simulator please? As well as any Stream Rules or Pipeline Rules configured to operate on the Streams this message is supposed to pass through?

Simulator



Stream rule


Screenshot from 2022-04-04 17-19-49

Pipeline Rule


Also, can you post the full raw log message? Exactly as it would appear coming from the log source. Best way to do this is to run a tcpdump with the -A flag on the interface and port Graylog is receiving log messages on.

Here’s a link to download the pcap file, hope that I did right.
pcap file

Any other informations you need, just ask.

Thanks in advance for your help!

Hi @tmacgbay,

So, I tried to debug using your code (hope I did right), and the events were these.

These log lines are confirming that the code is right, no? I’m Graylog newbie so forgive me if I didn’t completely understand this step.

Thanks in advance.

Looks to me like the data is coming in fine. The key_value() function is working properly and there is not to much to go wrong with set_fields() from there. Are you seeing the new fields with data showing up?

Yeap, I followed all the articles that I could find, It’s very strange to me because Graylog works so fine, I mean, the docs and when you guys help, the configurations always have worked for me.
An example of an event from my instance.


As you can see, the fields are not created.

That view is also seeing logs from “all_messages” which would have the messages pre-pipeline unless you have specifically set the stream to remove the messages from “all_messages”

Restrict your view to only the stream that has the pipeline running.

My bad, I apologize, the correct stream.


@lmattos90

The simulator does indeed work, but it is not accurately representing the live data coming in. You’re basically testing a small raw substring of the data instead of testing data as it’s actually coming in- JSON format. You’re proving the lights work, but the rest of the car does not.

I was unable to open your pcap file because it is stored on a 3rd party service, but grab just 1 single log message (try running tcpdump -i [your-interface] -nnA dst port [beats-input-port], or just paste a single log from your pcap in this thread) and put it in the “Raw message” field of the simulator to have the most accurate testing.

Also, remove the Source IP from the simulator and set the message codec to Beats to match the codec of the Input this message is coming in through.

Let’s see how the simulator behaves after these changes and go from there.

Hi @william,

Thank you very much for the explanations, I was suspicious that I was doing wrong using the raw codec, but I didn’t find an explanation or example anywhere.

I was unable to open your pcap file because it is stored on a 3rd party service, but grab just 1 single log message (try running tcpdump -i [your-interface] -nnA dst port [beats-input-port] , or just paste a single log from your pcap in this thread) and put it in the “Raw message” field of the simulator to have the most accurate testing.

This is the output command, Seems the data came in a messy way or, did I do something wrong with the command?

Also, remove the Source IP from the simulator and set the message codec to Beats to match the codec of the Input this message is coming in through.

If I do that the simulator doesn’t work

Just to give you more context about the way I’m getting these events.

I’m using a Powershell script to dump data from an MSSQL server database, and saving this data in a text file, I’m saving the events in lines, like, one event, one line.

From there, I get the events through a sidecar collector, more specifically a filebeat, and in the Graylog console I just configured a sidecar as the documentation explains.

I just want to mention that to be assured that I’m not letting you without important information.

you can add more debug() to your function to look at the the fields that set_fields() should have created…

debug(concat("+++++++ pc_name is :", to_string(message.pc_name)));
debug(concat("+++++++ user_name is :", to_string(message.user_name)));
debug(concat("+++++++ file_name is :", to_string(message.file_name)));
debug(concat("+++++++ operation is :", to_string(message.operation)));

I would be surprised if they are not created, if so there is some unregistered typo in your set_fields() function or there is something messed with further down the line before it is stored in Elastic.

I don’t think it’s an issue with your Input, we have already proven that it arrives, gets into stream, pipeline, and rule since the right data shows with the debug() of the results of the key_value()

Is there anything else you are doing to the message? Any other rules or pipelines that are working on it before or after?

2 Likes

Hi @tmacgbay,

Is there anything else you are doing to the message? Any other rules or pipelines that are working on it before or after?

Nope, just this pipeline itself, none of the other types of extractors or something similar.
I’m just doing and learning bit a bit, so I want to do just this pipeline rule right now.

Hello @tmacgbay,

Some interesting occurred during the tests, I modify the pipeline a little bit to use the debug() function per line, and I performed two tests, let’ go.

1 - Executed the pipeline as a test, it’s important to notice, the only way this simulation works, it’s when I use the codec raw string, but as you can see, the pipeline code is working.


2 - But, if I’m doing the execution in the “real way”, the logs show that the fields are not extracted, as you can see in this picture.

Previously you were doing the real way and the debug string for THOR_GOT had data in it, correct?

That was this line:

debug(concat("+++++++ KV_Fields are:", to_string(THOR_GOT)));

Hopefully yes.

Can you post your entire rule code again with the debug statements in it (and using the </> tool so it looks nice) direct copy and past, not a screen shot… :stuck_out_tongue:

Previously you were doing the real way and the debug string for THOR_GOT had data in it, correct?

That’s correct, as we can see in a post of this very article, up high.

Can you post your entire rule code again with the debug statements in it (and using the </> tool so it looks nice) direct copy and past, not a screen shot… :stuck_out_tongue:

Sure!

rule "Safetica Database Logs KV"
when
    has_field("message")
then
    let extract = key_value(
                   value:                 to_string($message.message),
                   trim_value_chars:      "",
                   trim_key_chars:        "",
                   delimiters:            "|",
                   kv_delimiters:         "="
                );
    set_fields ( fields: extract);
        //
        //
    debug(concat("+++++++ pc_name is :", to_string(extract.pc_name)));
    debug(concat("+++++++ user_name is :", to_string(extract.user_name)));
    debug(concat("+++++++ file_name is :", to_string(extract.file_name)));
    debug(concat("+++++++ operation is :", to_string(extract.operation)));
 
end

You have changed the debug():upside_down_face:

we want to look at message.<field>, not extract.<field> so we can se if the set_fields() did it’s job at that point.

Lets also take out the “fields:” part of set fields so it’s set_fields(extract); Lets have it look like this and see what happens in the log files…

rule "Safetica_Message_Fields"

when
    has_field("message")
then

    let THOR_GOT =  key_value(
                       value:                 to_string($message.message),
                       trim_value_chars:      "",
                       trim_key_chars:        "",
                       delimiters:            "|",
                       kv_delimiters:         "="
                );
    set_fields(THOR_GOT);
    //
    // Find debug output with:   tail -f /var/log/graylog-server/server.log 
    debug(concat("+++++++ KV_Fields are:", to_string(THOR_GOT)));
    debug("=============================================================");
    // The debugs below find out if the set_fields() above did it's job
    debug(concat("+++++++ pc_name is :",   to_string(message.pc_name)));
    debug(concat("+++++++ user_name is :", to_string(message.user_name)));
    debug(concat("+++++++ file_name is :", to_string(message.file_name)));
    debug(concat("+++++++ operation is :", to_string(message.operation)));
end

I’m sorry, I misunderstood, I still learning this code structure from the pipelines.

As you requested.

I imagine once you moved to UTF-8 that this worked?