Routing messages with pipelines, one large pipeline or multiple smaller ones?

I’ve got one input stream currently which will be ingesting at the very least hundreds, if not 1-2 thousand messages per second; i want to split these messages out into multiple, smaller streams as soon as possible, based on the type of activity these are.

Sub stream A
Sub stream B
Sub stream C

etc, the messages will only be relevant to one of those streams, what’s the best way to route them?

  1. One pipeline, with multiple stages, each stage routing messages to a stream, if not matched, falling down to the next stage?
    OR
  2. Multiple Pipelines, each with only one rule, that routes to a single output stream.

I suspect 1) would be most efficient, as at each stage, you’re runing the check on a reduced set of messages, instead of on every message, multiuple times?

You can do one pipeline, with one stage, and then put multiple rules in the stage. As long as the when clause of each only matches what it should, 3 rules one for each stream.

I know i could do one stage, with multiple rules, but then isn’t every message going to be evaluated against however many rules i’ve got, which feels like wasted processing time as there will only ever be one match per message?
That was the thinking by having multiple stages with only one rule per stage, then having it continue to the next stage if it doesn’t match the current rule/stage.

Or is the difference in processing so minimal it doesn’t really matter?

Annoyingly, I’ve noticed that the option in the pipelines is actually “none or more rules”, so there’s no way to have messages drop down a signel pipeline, then stop processing for that message when it hits the first rule?

So each message is going to have to be evaluated against all the potential destination streams, even if i know it can only match one output stream

You can use drop_message to exclude a message from following stages.
Not sure if this is any faster than just running through all the matching conditions. Probably depends on how complex those conditions are.

I wouldn’t worry of the speed of it, pipelines are REALLY fast. The only thing (except at huge TB’s a day scale) that you need to be careful with is using too much regex, other than that just do whatever makes your life easiest.

Cool, thanks all!

I thought i’d likely be chasing efficiencies that would barely exist but good to have the cinfirmation!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.