Hello graylog community,
I have one more question this month:
Per Create/Cycle index set to specific ID? I am naming our index templates like “logs_r01m”, “logs_r06m”, “logs_r12m”. I am using stream filters to pick out logs for each retention period / index set. It works well, but lately I had to create a couple of special retention periods and realized that managing stream rules that route the messages to individual indices is getting harder and harder.
If I could use “All messages” stream as a fallback, then I would use it for default retention period and considerably simplify the stream rules. Unfortunately I see no way to change where logs from “All messages” go.They are forced to go to “Default index set” and then into the graylog_* index set.
I would very much like you to prove me wrong and to show me how to redirect them.
Oh, right, the answer to everything is “Pipeline processor”.
Obviously I can write a rule that reroutes everything from a default stream into another one and connect that to “All messages” stream.
But that approach raises a few complications. First, it forces me to mix “message processing” and “message routing” in the pipelines, making them even more convoluted and making stream routing even more non-debugable. Second is a lack of documentation on what happens when pipelines re-route messages into a different stream:
I mean - Because messages are put through pipelines based on their stream membership, what happens when a pipeline rule changes stream membership?
Does pipeline processing continue like nothing happened?
Is pipeline processing aborted if the pipeline is no longer matching?
If so, does it abort after the step that changed streams, or after the pipeline runs to its end?
Is pipeline processing started in a newly matched pipeline?
If so, does it start at the “current” pipeline step, or the new pipeline start from its beginning?
Yes, I am using this. But the problem is making stream rules so no message slips to graylog_* via ‘All messages’. All streams are set to “Remove matches from ‘All messages’ stream”.
Streams marked in red are there just to pull all remaining messages into the default retention period index set. If it was possible to redirect ‘All messages’, all these streams (50%!) would be unnecessary.
Rather than setting up so many stream rules, set up rules in a pipeline that creates flags-as-fields and use pipeline staging to decide what to do with the flags it finds. In later stages you can route to streams based on flag fields you find and even delete the flag field if you no longer want it. This might end up as a routing stream for all messages coming in and subsequent pipelines are attached to the the routed-to stream. You can use rule naming conventions to keep these routing rules alphabetically grouped for easier future edits.
Does pipeline processing continue like nothing happened?
I believe processing runs to the end of the pipeline regardless of when you route it. Easy to confirm in testing if you want to be sure
Is pipeline processing aborted if the pipeline is no longer matching?
Nope, once you start a pipeline, it runs to the end
If so, does it abort after the step that changed streams, or after the pipeline runs to its end?
see previous answer
Is pipeline processing started in a newly matched pipeline?
Always. You can have multiple pipelines attached to a stream but I don’t think you can control sequence… yet. I think that’s what you were thinking…
If so, does it start at the “current” pipeline step, or the new pipeline start from its beginning?
Pipelines always start from the beginning and run all the way out. There was some questions in the forums recently asking about why pipelines continue to run after a drop_message() function… this is what leads me to believe there is nothing that currently aborts a pipeline.
It would be nice to choose pipeline sequence on a stream - or for that matter rule sequence in a pipeline stage… or even to have an abort_stream() function… all those are currently potential feature requests.
EDIT: Read further down for more detail - messages can run through multiple streams concurrently but execute in the same stage… but the current docs are unclear about what the sequence is on use of route_to_stream()
Thank your for this long answer.
Yes, I should have said I am asking “In case someone knows” and otherwise fall back to experimenting.
Let’s have a simple scenarios:
two streams ‘All mesages’ and ‘Alternate stream’
no stream rules, everything goes to ‘All messages’
two pipelines, ‘Default’ and matching ‘All messages’ and ‘Alternate’ and matching ‘Alternate stream’.
both pipelines have stages -2, -1, 0 and 1
Now, new messsages arrives and is routed to ‘All messages’ so processing in pipeline ‘Default’ starts.
Let’s imagine in step 0 there is route_to_stream(name:"Alternate stream", remove_from_default: true);
What do you expect to happen? OK, pipeline ‘Default’ will probably run all the way to its end.
You expect that newly matched pipeline ‘Alternate’ will start at this point. Will it start processing its stage -2 concurrently with Default pipeline stage 1? Or it will wait until pipeline Default finishes and start after that?
Good question - someone inside Graylog or who has tested that scenario would have to answer… Maybe @aaronsachs can provide some insight? Based on how Graylog handles rules in each stage, it would suggest that it would randomly handle that scenario - which is unhelpful. The result is to be mindful and place your routing and dropping at the end of pipeline staging.
Have you seen the Pipeline Simulator? I personally haven’t used it a long time but it did help me out on Staging and Rules. Maybe it will give you a better insight.
@tmacgbay - Yes, keeping fingers crossed that your call for devs will succeed.
I don’t want my rules to accidentally depend on accidental/unstable behavior.
@gsmith - nice idea. Unfortunately it is giving me just errors
PS: The solution to my original question is trivial - assuming you avoid routing in pipelines like I do.
Just create a stream “All messages (Editable)” and attach this one-liner rule to All messages:
route_to_stream(name: "All messages (Editable)", remove_from_default: true);
Oof. Off the top of my head, I believe pipelines are run in parallel. E.g., 2 pipelines with 0, 1, and 2 stages will run those stages in parallel with each other.
So I am explicitely guaranteed that a newly matched pipeline will somehow start. Nice find @tmacgbay !
I think this also answers that clone_message() / create_message() will just start their own pipelines.
The only question left is: Will A1 run together with M3 or after M3?
Built a test scenario where a message rides two pipelines and as expected the message hit all stages in numerical sequence. When the second test pipeline was cleared of stream connections and I used route_to_stream() to get a message to it, message processing would only start after the first stream had finished and it ran all stages regardless of where the initiating route_to_stream() was placed.
Where:
Pipeline one has stages MINUS TWO, ZERO and TWO
Pipeline two has stages MINUS ONE, and ONE
A message starting more than one pipeline will run parallel staying the same stages all the way through. Example results:
A message that gets to a pipeline via route_to_stream() will finish its current pipeline and then start the new pipeline from the beginning. Example results:
Certainly not comprehensive tests but interesting results!