Rename index set backing "All messages" stream?

Hello graylog community,
I have one more question this month:

Per Create/Cycle index set to specific ID? I am naming our index templates like “logs_r01m”, “logs_r06m”, “logs_r12m”. I am using stream filters to pick out logs for each retention period / index set. It works well, but lately I had to create a couple of special retention periods and realized that managing stream rules that route the messages to individual indices is getting harder and harder.

If I could use “All messages” stream as a fallback, then I would use it for default retention period and considerably simplify the stream rules. Unfortunately I see no way to change where logs from “All messages” go.They are forced to go to “Default index set” and then into the graylog_* index set.

I would very much like you to prove me wrong and to show me how to redirect them.

Thank you!

Oh, right, the answer to everything is “Pipeline processor”.
Obviously I can write a rule that reroutes everything from a default stream into another one and connect that to “All messages” stream.

route_to_stream(name:"logs_r12m", remove_from_default: true);

But that approach raises a few complications. First, it forces me to mix “message processing” and “message routing” in the pipelines, making them even more convoluted and making stream routing even more non-debugable. Second is a lack of documentation on what happens when pipelines re-route messages into a different stream:
I mean - Because messages are put through pipelines based on their stream membership, what happens when a pipeline rule changes stream membership?

  • Does pipeline processing continue like nothing happened?
  • Is pipeline processing aborted if the pipeline is no longer matching?
  • If so, does it abort after the step that changed streams, or after the pipeline runs to its end?
  • Is pipeline processing started in a newly matched pipeline?
  • If so, does it start at the “current” pipeline step, or the new pipeline start from its beginning?

… and all of these options sound bad.

PS: My processing order is set up like gsmith’s:

Hello,

Not sure about the pipeline but on the stream’s when you create one or one that exists, there is a tic box as shown below.

image

Yes, I am using this. But the problem is making stream rules so no message slips to graylog_* via ‘All messages’. All streams are set to “Remove matches from ‘All messages’ stream”.

Streams marked in red are there just to pull all remaining messages into the default retention period index set. If it was possible to redirect ‘All messages’, all these streams (50%!) would be unnecessary.

Rather than setting up so many stream rules, set up rules in a pipeline that creates flags-as-fields and use pipeline staging to decide what to do with the flags it finds. In later stages you can route to streams based on flag fields you find and even delete the flag field if you no longer want it. This might end up as a routing stream for all messages coming in and subsequent pipelines are attached to the the routed-to stream. You can use rule naming conventions to keep these routing rules alphabetically grouped for easier future edits.

  • Does pipeline processing continue like nothing happened?
    :point_right: I believe processing runs to the end of the pipeline regardless of when you route it. Easy to confirm in testing if you want to be sure
  • Is pipeline processing aborted if the pipeline is no longer matching?
    :point_right: Nope, once you start a pipeline, it runs to the end
  • If so, does it abort after the step that changed streams, or after the pipeline runs to its end?
    :point_right: see previous answer :stuck_out_tongue:
  • Is pipeline processing started in a newly matched pipeline?
    :point_right: Always. You can have multiple pipelines attached to a stream but I don’t think you can control sequence… yet. I think that’s what you were thinking…
  • If so, does it start at the “current” pipeline step, or the new pipeline start from its beginning?
    :point_right: Pipelines always start from the beginning and run all the way out. There was some questions in the forums recently asking about why pipelines continue to run after a drop_message() function… this is what leads me to believe there is nothing that currently aborts a pipeline.

It would be nice to choose pipeline sequence on a stream - or for that matter rule sequence in a pipeline stage… or even to have an abort_stream() function… all those are currently potential feature requests.

EDIT: Read further down for more detail - messages can run through multiple streams concurrently but execute in the same stage… but the current docs are unclear about what the sequence is on use of route_to_stream() :thinking:

2 Likes

Thank your for this long answer.
Yes, I should have said I am asking “In case someone knows” and otherwise fall back to experimenting.

Let’s have a simple scenarios:

  • two streams ‘All mesages’ and ‘Alternate stream’
  • no stream rules, everything goes to ‘All messages’
  • two pipelines, ‘Default’ and matching ‘All messages’ and ‘Alternate’ and matching ‘Alternate stream’.
  • both pipelines have stages -2, -1, 0 and 1

Now, new messsages arrives and is routed to ‘All messages’ so processing in pipeline ‘Default’ starts.
Let’s imagine in step 0 there is route_to_stream(name:"Alternate stream", remove_from_default: true);

What do you expect to happen? OK, pipeline ‘Default’ will probably run all the way to its end.
You expect that newly matched pipeline ‘Alternate’ will start at this point. Will it start processing its stage -2 concurrently with Default pipeline stage 1? Or it will wait until pipeline Default finishes and start after that?

Good question - someone inside Graylog or who has tested that scenario would have to answer… Maybe @aaronsachs can provide some insight? Based on how Graylog handles rules in each stage, it would suggest that it would randomly handle that scenario - which is unhelpful. The result is to be mindful and place your routing and dropping at the end of pipeline staging.

@nisow95612
Hello,

Have you seen the Pipeline Simulator? I personally haven’t used it a long time but it did help me out on Staging and Rules. Maybe it will give you a better insight.

@tmacgbay - Yes, keeping fingers crossed that your call for devs will succeed.
I don’t want my rules to accidentally depend on accidental/unstable behavior.

@gsmith - nice idea. Unfortunately it is giving me just errors

I figured out it dislikes my standard timestamp sanitization rule:

rule "Timestamp sanitizer (not ± 30 minutes)"
when
    to_date($message.timestamp) - minutes(30) > now() || 
    to_date($message.timestamp) + minutes(30) < now()
then
  set_field("timestamp_wrong", to_date($message.timestamp));
  set_field("timestamp", now());
end
-----------------------------------------------------------------

PS: The solution to my original question is trivial - assuming you avoid routing in pipelines like I do.
Just create a stream “All messages (Editable)” and attach this one-liner rule to All messages:

route_to_stream(name: "All messages (Editable)", remove_from_default: true);

The condensed question is:

If a message in mid-pipeline is shunted to a new stream that has a different pipeline attached to it, do those pipelines run sequentially or parallel?

—> hey @aaronsachs!!! :stuck_out_tongue:

1 Like

Oof. Off the top of my head, I believe pipelines are run in parallel. E.g., 2 pipelines with 0, 1, and 2 stages will run those stages in parallel with each other.

2 Likes

Thank you for coming here. Let me try to explain with a picture:
Rerouting mid-pipeline

Main question is: When, if at all, does Stage A1 run?
I am asking how you think it should be, because I want to avoid relying on unintended behavior.

other questions
  • Does M3 run? Main pipeline no longer matches, so may make sense to stop there.
  • Does Alternate pipeline run at all? Alternate pipeline now matches, it makes sense to start it.
  • Does Alternate pipeline start from the beginning or from the current processing step (A3)?
  • If Alternate pipeline starts from A1, does A1 run during M3, or it waits until Main completes?

What about messages created by clone_message() / create_message()?

  • How do these enter pipelines? Do they continue at M3 or start from M1?
  • If I execute route_to_stream(name:"Alternate") on them, do they start from A1 or A3?

Best to review the docs on pipeline staging. If you want authoritative answers, you might have to sign up for support. :slight_smile:

1 Like

EDIT: answer to this part is there explicitely:

So I am explicitely guaranteed that a newly matched pipeline will somehow start. Nice find @tmacgbay !
I think this also answers that clone_message() / create_message() will just start their own pipelines.

The only question left is: Will A1 run together with M3 or after M3?

Built a test scenario where a message rides two pipelines and as expected the message hit all stages in numerical sequence. When the second test pipeline was cleared of stream connections and I used route_to_stream() to get a message to it, message processing would only start after the first stream had finished and it ran all stages regardless of where the initiating route_to_stream() was placed.

Where:

  • Pipeline one has stages MINUS TWO, ZERO and TWO

  • Pipeline two has stages MINUS ONE, and ONE

  • A message starting more than one pipeline will run parallel staying the same stages all the way through. Example results:
    image

  • A message that gets to a pipeline via route_to_stream() will finish its current pipeline and then start the new pipeline from the beginning. Example results:
    image

Certainly not comprehensive tests but interesting results!

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.