Pipelines on same stages working on same messages?

gbls · April 3, 2018, 11:58am

We have defined two pipelines, both on stream ‘All Messages’, with different stages (0 and 1)

Original message:
facility: user-level
level: 6
timestamp: 2018-03-20T12:53:38.000Z
source: MyDummyHostName

Picked up by Pipeline (stage 0)
rule “security”
when
has_field(“source”)
&& contains(to_string($message.source),“MyDummyHostName”) == true
then
set_field(“pipeline_rule”,to_string(“security”));
set_field(“hostname”,to_string($message.source));
set_field(“source”,to_string(“syslog”));
route_to_stream(id:“5aa0ec7f3893e1097cc20f65”, remove_from_default:true); //security
end

results in Message:
facility: user-level
level: 6
hostname: MyDummyHostName
source: syslog
timestamp: 2018-03-20T12:56:51.000Z

And the above message is as well picked up by different pipeline (stage 1)
pipeline (stage 1)
rule “Syslog”
when
has_field(“facility”)== true
&& has_field(“level”)== true
&& contains(to_string($message.source),“MyDummyHostName”) == false
then
set_field(“pipeline_rule”,to_string(“Syslog”));
set_field(“hostname”,to_string($message.source));
set_field(“source”,“syslog”);
route_to_stream(id:“5aa0e05f3893e1097cc1a988”, remove_from_default:true); //syslog_default_new
end
facility: user-level
hostname: syslog
level: 6
message:
pipeline_rule: Syslog-Default-new
source: syslog
timestamp: 2018-03-20T12:59:40.000Z

My understanding is that the pipeline ‘security’, running as stage 0, connected to ‘All Messages’ stream, is picking up the message at first, doing all the stuff defined within the rule. Once that pipeline is finished, any other pipeline connected to the same stream running on the same stage are now run.
In my case, there is no pipeline anymore running on stage 0.
Now, the second pipeline ‘syslog’, running on stage 1 runs. Somehow that pipeline is working on the same message already handled by pipeline ‘security’, which had as well moved that message to a different stream (and deleted from original stream!).
IMHO that’s a fault, this approach results in having a message with wrong defined fields, as ‘hostname’ is now set to ‘syslog’.

So, did I found a bug or is it ‘works as designed’.
Am I’m doing something wrong?
Sure it’s possible to extend the ‘when’-definitions to not run into that issue, but finally that would increase overhead and maintenance work as those statements are getting more complex.

jochen · April 4, 2018, 3:15pm

When using route_to_stream(), messages are assigned to the new streams after the pipeline has been completed and will then run through the pipelines associated with these new streams.

In your case, it would make sense to split your processing logic into multiple pipelines.

gbls · April 5, 2018, 6:51am

Jochen, thanks for your reply.
But I have already multiple pipelines. Here in my example I’m having two pipelines configured.

Pipeline Security (stage 0), attached to stream ‘All Messages’, looking for messages where source = MyDummyHostName, defining some default fields and moving the messages to new ‘security’ stream.

Pipeline Syslog (stage 1), attaches to stream ‘All Messages’, looking for messages having fields facility and level defined. Setting as well some default values and moving these messages to new ‘syslog’ stream.

So, two pipelines, working on same stream.
But the issue is that both pipelines are working on same messages.

The first pipeline is reading a message looking like (note that field hostname is missing!):
facility: user-level
level: 6
timestamp: 2018-03-20T12:53:38.000Z
source: MyDummyHostName

That pipeline updated the message to be:
facility: user-level
level: 6
hostname: MyDummyHostName
source: syslog
timestamp: 2018-03-20T12:56:51.000Z
And moved that message to new stream ‘security’ - so far so good.

But now the second pipeline ‘syslog’, running on stage 1, reading messages from ‘All Messages’ is picking up the same message.
The result is this message:
facility: user-level
hostname: syslog
level: 6
message:
pipeline_rule: Syslog-Default-new
source: syslog
timestamp: 2018-03-20T12:59:40.000Z

So, I have different pipelines which are moving messages into different streams.
So, I have different pipelines which are moving messages into different streams.
I would now have expected that only one pipeline, where the match criteria apply, would process the messages in ‘All Messages’, since it has already moved these messages.
Unfortunately, this is not the case.
Therefore the pipeline ‘Syslog’ additionally contains the criterion ‘&& contains(to_string($message.source), “MyDummyHostName”) == false’
And from my understanding that shouldn’t be necessary because the message should already be moved to a new stream.

jochen · April 5, 2018, 7:14am

Maybe we’re mixing up terminology here. A pipeline has one or more stages and can be assigned to a stream.

Processing in a single pipeline starts in the first stage and continues through the stages if all or one of the rules in the previous stage were successfully completed (i. e. if the condition in the when clause was true).

Think of stages as groups of conditions and actions which need to run in order. All stages with the same priority run at the same time across all connected pipelines.

Quote from http://docs.graylog.org/en/2.4/pages/pipelines/pipelines.html

Since the pipelines run in parallel, the mutations of a message in one pipeline assigned to a specific stream aren’t necessarily available in another pipeline assigned to the same stream in the same stage.

So in your case, I’d create a pipeline attached to “All messages” which only sorts the messages into the correct streams and then process the messages (add/remove message fields) in pipelines attached to the respective streams.

gbls · April 5, 2018, 11:03am

You may be right, but I still have doubts.
The pipelines I’m talking about only have one stage at a time.

There is also only one pipeline with stage 0.
Therefore, this pipeline is completely terminated when the rule of stage 0 has been processed.
And yet another pipeline (Security, Stage 1) processes the same message…

jochen · April 5, 2018, 11:16am

Yes, because the updated stream information is applied after all pipelines assigned to the current stream have been completed.

If you’re interested in the internals, you can take a look at the code of the processing pipeline:

github.com

Graylog2/graylog-plugin-pipeline-processor/blob/2.4.3/plugin/src/main/java/org/graylog/plugins/pipelineprocessor/processors/PipelineInterpreter.java#L125-L163


      
          while (!toProcess.isEmpty()) {
              final MessageCollection currentSet = new MessageCollection(toProcess);
              // we'll add them back below
              toProcess.clear();
          
              for (Message message : currentSet) {
                  final String msgId = message.getId();
          
                  // this makes a copy of the list, which is mutated later in updateStreamBlacklist
                  // it serves as a worklist, to keep track of which <msg, stream> tuples need to be re-run again
                  final Set<String> initialStreamIds = message.getStreams().stream().map(Stream::getId).collect(Collectors.toSet());
          
                  final ImmutableSet<Pipeline> pipelinesToRun = selectPipelines(interpreterListener,
                          processingBlacklist,
                          message,
                          initialStreamIds,
                          state.getStreamPipelineConnections());
          
                  toProcess.addAll(processForResolvedPipelines(message, msgId, pipelinesToRun, interpreterListener, state));

This file has been truncated. show original

gbls · April 5, 2018, 11:30am

And that fact was the missing item

Thanks for clarification.

system · April 19, 2018, 11:42am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Message in more then one pipelines Graylog Central (peer support)	1	307	December 3, 2020
Rename index set backing "All messages" stream? Graylog Central (peer support) pipeline-rules , route-to-streampl , debuggingpl	15	1663	November 9, 2021
Messages for some streams dissapear when I have a multiple stages in pipeline Graylog Central (peer support) basic-configuration	12	875	January 25, 2022
Pipeline does not work with custom stream, only with "All Messages" Documentation Campfire	4	498	June 6, 2023
Pipeline problems Graylog Central (peer support) pipeline-rules , route-to-streampl	5	2190	March 2, 2018

Pipelines on same stages working on same messages?

Related topics