I’m trying to use pipelines for the very first time! Fun.
I have an input called vpn-msgs-5004. I can filter on these message with:
gl2_source_input:60d1d5d5f611a86add34edac
I have a Stream called ASA_AnyConnect. It has one matching rule:
gl2_source_input must match exactly 60d1d5d5f611a86add34edac
The system shows I’m getting about 3,000 messages per second on this stream. No problem so far. The stream is configured to Remove matches from ‘All messages’ stream.
I have a pipeline called ASA_Pipeline. It is connected to the ASA_AnyConnect stream. I have one Stage and it has only one rule. It’s a very simple setup.
But the UI shows there are no messages hitting the ASA_Pipeline and the rule is not getting hit.
What did I miss? I’m sure this is user error, but I’m green on the pipeline system.
Could you post the ASA_Pipeline rule and a typical message it should process? The rest that you describe seems right to me unless you have a typo. Before even posting you can change the condition between the when and then to always be true (pretty sure you can just put in the word true and nothing else.) just to see if that might be blocking…
rule "Drop_Message"
when
true
then
drop_message();
end
The reason this is such a simple (and seemingly non-sensical) rule is:
Inputs auto-start when you restart the system. I have some inputs that I need to run as-needed because they create too much data. If I send those messages to the Drop_Message rule by default, I can save my database from getting accidentally filled up after a system restart.
This is an easy way to get my feet wet and figure out how to do pipelines.
Interesting - on the off chance it is working but isn’t registering where you want, you can put a debug() function in the rule and watch the Graylog logs to see if it made it there.
rule "Drop_Message"
when
true
then
debug("_*_*_* - Drop Message rule was hit.");
drop_message();
end
you can watch for the debug message to appear in Graylog logs with:
I’m stumped. You said you see messages coming into the stream - double check the pipeline connection - go into the pipeline, clicked on “Edit connections” and make sure your stream is listed. Here is an example of “Linux Events” pipeline connected to “Linux Stream” Stream
I rebuilt everything with new names and everything from scratch. Still doesn’t work.
Do I need to be running an Enterprise Graylog license? I’m running the free version of Graylog.
Everything is running in Stage 0. I don’t have any pipelines that have anything other than Stage 0.
How do I check the Message Processors Configuration order?
Thanks.
Actually I think your right.
After some more reading I see that the order works best when fields in a pipeline are in play. Then realizing @danmassa7 is wanting to just drop all message so the order of the message processors would not help much. Perhaps the stage of the pipeline might be something to look at?
That did it. Success. I move the Message Filter Chain ahead of the Pipeline Processor. The messages now hit the Pipeline.
WHY!?!
I found this text in the Graylog Streams documentation:
However, if you prefer to use the original stream matching functionality (i.e. stream rules), you can configure the Pipeline Processor to run after the Message Filter Chain (in the Message Processors Configuration section of the System → Configurations page) and connect pipelines to existing streams. This gives you fine-grained control over the extraction, conversion, and enrichment process.
I use Streams to direct my messages to particular index (a crucial feature). And I also want to use pipelines because that’s a really cool feature in Graylog. Why would anyone ever want to run the Pipeline Process before the Message Filter Chain? But I guess that’s the default order out-of-the-box. Whatever the reason, the default order out-of-the-box is completely dysfunctional if you want to use the pipeline feature.
Thank you everyone for all your help! I now have the Pipeline Processor after the Message Filter Chain and things are working well.
Yay! Finally got there… thanks @gsmith! Interesting that pipelines are apparently useless unless you actively swap order with the Message Filter Chain… Seems like a bug to me…
EDIT: Looks like @jan made mention of this awhile back but judging from this post it looks like the default config needs to be reconfigured which most people are unaware.
To close the loop, here’s what I found with the Bit_Bucket rule.
When I enable the Input for ASA AnyConnect source without the pipeline in place it does the following in this order:
Slams all four CPUs
Output buffer on the node gets max’ed out.
Process buffer then get’s max’ed out.
Disk Journal starts to fill.
Bunch of disk space for Elasticsearch gets chewed up.
The system very soon collapses and stops processing things correctly for any stream.
Because inputs start automatically (I have put in a feature request to stop this) this is a problem. Here’s the interesting fix:
Create a stream that captures that one input and removes it from the All Messages stream.
Create a pipeline with the bit_bucket rule. Attach the pipeline to the stream.
Make sure you have the tweak that makes the Message Filter Chain act before the Pipeline Processor.
Now the results on the system are:
The CPUs goes up about 50 percentage points. They are not maxed out. So, it takes some CPU to throw out the message, but not as much.
Output buffer, Process buffer and Disk Journal are fine and don’t fill.