New to Pipelines: One stage with all rules or stages based on match?

1. Describe your incident:
Not really an incident, I’m trying to understand the trade off between cramming a bunch of related rules in to one stage vs have a stage per rule.

I’m currently migrating my extraction rules to a pipeline

Looking at my long list of extraction rules I identified the rule that is triggered the most for parsing pfsense filterlog (firewall) logs and added that to stage zero. I then looked at the next rule that was triggered the most. I wasn’t sure if I should add it to the same stage or not. I chose to add a new stage (stage 1). My rational is that I’ve select the (somewhat confusingly named) option in stage zero of
“Messages satisfying none or more rules in this stage, will continue to the next stage”

I read this as “if a rule in this stage matches, stop here and don’t execute any more stages” and it’s quite possible I’m reading that wrong.

The reason this sees logical to me is once I identify a pattern for pfsense logfilter that matches, I’ll capture all the fields I need without further enrichment. This should reduce the load on the pipeline processing so it doesn’t try to match a regex on the next rule that I know won’t match.

So my question is, does it make a difference if I cramp all my pfsense logfilter (firewall log) rules into one stage or does the design pattern that i followed, having a different stage for each rule make more sense.

2. Describe your environment:

  • OS Information:
    Docker single node, personal use.

  • Package Version:
    5.0.12

  • Service logs, configurations, and environment variables:
    None

3. What steps have you already taken to try and solve the problem?
I’ve read through the documentation but there doesn’t seem to be any guidance (or I didn’t look hard enough) on when you should create another stage for a new rule vs keep it in the same stage.

4. How can the community help?
Looking for guidance or recommendation on how to think about the trade offs between pipeline stages and performance.

No it works in the reverse of that, you can choose that it will continue to the next stage if all rules match, at least 1, or even if nothing matches keep going.

It really depends how expensive your matching rules are in the when section of your rule. For example you dont want a when that uses regex to run against every message if possible.

How are you identifying the different log formats?

If its only a couple choices i would put them all in one step, make sure that each when in the rules dont overlap. And then if you can do all the parsing of that format inside that single rule.

You could also writes a rule that identifies the type, and then sets a field value for the log type and then in a second stage your when clause just checks that field value to see if it should run.

Joel,
Thanks for the clarification. I think what I read, was what I wanted it to read, vs what it actually reads (wishful thinking). Below is the beginning of the 6 rules I have.

rule “pfSense filterlog: IPv4 IGMP messages”
when
regex(“^.,(in|out),4,.,(?i)igmp,.$“, to_string($message.message)).matches == true
then

rule “pfSense filterlog: IPv4 TCP”
when
regex(”^.
,(in|out),4,.,(?i)tcp,.$”, to_string($message.message)).matches == true AND NOT has_field(“RuleNumber”)
then

rule “pfSense filterlog: IPv4 UDP”
when
regex(“^.,(in|out),4,.,(?i)udp,.$“, to_string($message.message)).matches == true
then

rule “pfSense filterlog: IPv6 ICMP”
when
regex(”^.
,(in|out),6,.,ICMPv6,.$”, to_string($message.message)).matches == true
then

rule “pfSense filterlog: IPv6 UDP”
when
regex(“^.,(in|out),6,.,(?i)udp,.*$”, to_string($message.message)).matches == true
then

Based on your suggestion, I’ll mash them back into one stage. I’ve liberated the regex’s from several previous pioneers.

I’ve done testing on regex101, but not exhaustive testing. I don’t believe the patterns overlap and there are enough unique values in each patter to prevent overlap. What I’d really like is if once a pattern matched, I could halt further processing. Like a firewall rule with first match. This reduces the load I put on a pipeline. I know that I can only match one rule, It would be great to just execute that one.

Any thoughts on reducing the load of adding more patterns that don’t need to match? Maybe I’m doing this wrong and should consider another design pattern.

I had a chat with one of our guys who writes rules for a living and here was his suggestion.

It depends a bit on the logs but I would do it multi staged.
Version 1:
The easiest/ least expensive rule maybe in stage 1 and set event_source_product (or any field but set a field after you got a match, and then later skip rules if there is already that field)
in stage the next more difficult regex but filter out messages with an event_source_product.
And so on…
Version 2:
Depending on the logs a different way could be more efficient by breaking the logs in groups. Here you look for common patterns. Let us say 4 log types start with <32> Jan 23 2014: Random data. Here you could write a rule that parses out the severity and the date, and the part after it goes int a later stage where you only parse out that part with the 4 options.

What is more efficient?
It depends on the log structure. If you have absolutely random formats, version 1 is better, but if the logs are somewhat similar, version 2 is better.

Grok is also slower than regex… I would use regex if possible.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.