Pipeline Gets No Traffic

Pipeline Receiving No Messages

I’m trying to use pipelines for the very first time! Fun.
I have an input called vpn-msgs-5004. I can filter on these message with:
gl2_source_input:60d1d5d5f611a86add34edac

I have a Stream called ASA_AnyConnect. It has one matching rule:

gl2_source_input must match exactly 60d1d5d5f611a86add34edac

The system shows I’m getting about 3,000 messages per second on this stream. No problem so far. The stream is configured to Remove matches from ‘All messages’ stream.

I have a pipeline called ASA_Pipeline. It is connected to the ASA_AnyConnect stream. I have one Stage and it has only one rule. It’s a very simple setup.

But the UI shows there are no messages hitting the ASA_Pipeline and the rule is not getting hit.

What did I miss? I’m sure this is user error, but I’m green on the pipeline system.

Thank you for your help.

Environmental information

Operating system information

centos-linux-release-8.3-1.2011.el8.noarch

Package versions

  • Graylog 4.0.7
  • MongoDB 4.2.14
  • Elasticsearch 7.10.2

Could you post the ASA_Pipeline rule and a typical message it should process? The rest that you describe seems right to me unless you have a typo. :upside_down_face: Before even posting you can change the condition between the when and then to always be true (pretty sure you can just put in the word true and nothing else.) just to see if that might be blocking…

Thank you for the quick response.

rule "Drop_Message"
when
   true
then
   drop_message();
end

The reason this is such a simple (and seemingly non-sensical) rule is:

  1. Inputs auto-start when you restart the system. I have some inputs that I need to run as-needed because they create too much data. If I send those messages to the Drop_Message rule by default, I can save my database from getting accidentally filled up after a system restart.
  2. This is an easy way to get my feet wet and figure out how to do pipelines.

Interesting - on the off chance it is working but isn’t registering where you want, you can put a debug() function in the rule and watch the Graylog logs to see if it made it there.

rule "Drop_Message"
when
   true
then
   debug("_*_*_* - Drop Message rule was hit.");
   drop_message();
end

you can watch for the debug message to appear in Graylog logs with:

tail -f /var/log/graylog-server/server.log

That’s a handy command. Thanks!
But unfortunately it confirms that nothing is hitting this filter.

I’m stumped. You said you see messages coming into the stream - double check the pipeline connection - go into the pipeline, clicked on “Edit connections” and make sure your stream is listed. Here is an example of “Linux Events” pipeline connected to “Linux Stream” Stream

image

I’m still assuming this is user error because I’m so green and because this is such a simple pipeline rule. Here are the relative screen shots.

I really hope this helps. We’re down in the dumb territory and I’ll feel stupid if this is something simple.

Thanks for your help.

Argh. :face_with_raised_eyebrow: Here is what I did and it worked:

  1. Created test_input gelf-http input
  2. Created a test_index index to receive the messages
  3. Created a testing_stream to route the test_input to the test_index
  4. Created a test_pipeline and attached it to the testing_stream
  5. Tested watching testing_stream by sending in command:
curl -v http://127.0.0.1:12201/gelf -d '{"short_message":"Hello there", "host":"example.org", "facility":"test", "_foo":"bar"}'
*   Trying 127.0.0.1:12201...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 12201 (#0)
> POST /gelf HTTP/1.1
> Host: 127.0.0.1:12201
> User-Agent: curl/7.68.0
> Accept: */*
> Content-Length: 86
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 86 out of 86 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 202 Accepted
< content-length: 0
< connection: keep-alive
<
* Connection #0 to host 127.0.0.1 left intac

seen here:
image

  1. Created our “Drop_message” rule and attached it to the test_pipeline.
  2. Ran the same test message through and it did not appear in the stream but I was able to get it to show in the pipeline view:
    image
    and in the Graylog log:

Maybe rebuild everything from scratch… haha! There has to be one small piece/typo we are both missing…

1 Like

Hello,

Out of curiosity, what stage is you pipeline in? and have you tried adjusting it?
Did you check your Message Processors Configuration order?

I rebuilt everything with new names and everything from scratch. Still doesn’t work.
Do I need to be running an Enterprise Graylog license? I’m running the free version of Graylog.

Everything is running in Stage 0. I don’t have any pipelines that have anything other than Stage 0.
How do I check the Message Processors Configuration order?
Thanks.

Hello,

Once you click the “Update” button then you can adjust it from there.

It doesn’t seem like there are any dependencies on the Message Filter Chain… But I have been dumb before! :stuck_out_tongue:

Actually I think your right.
After some more reading I see that the order works best when fields in a pipeline are in play. Then realizing @danmassa7 is wanting to just drop all message so the order of the message processors would not help much. Perhaps the stage of the pipeline might be something to look at?

I have never adjusted the Message Processors Configuration. Here is how I am set right now (should be system default).

I really have the most simple pipeline with no complex sequence of stages…

I was just curious if you had tried adjusting those for troubleshooting purposes.

For example
Setting Pipeline stage to -1

Or adjust order to Message Processor to something like this.

|1|Message Filter Chain
|2|Pipeline Processor
|3|GeoIP Resolver
|4|AWS Instance Name Lookup
2 Likes

That did it. Success. I move the Message Filter Chain ahead of the Pipeline Processor. The messages now hit the Pipeline.

WHY!?!

I found this text in the Graylog Streams documentation:

However, if you prefer to use the original stream matching functionality (i.e. stream rules), you can configure the Pipeline Processor to run after the Message Filter Chain (in the Message Processors Configuration section of the System → Configurations page) and connect pipelines to existing streams. This gives you fine-grained control over the extraction, conversion, and enrichment process.

I use Streams to direct my messages to particular index (a crucial feature). And I also want to use pipelines because that’s a really cool feature in Graylog. Why would anyone ever want to run the Pipeline Process before the Message Filter Chain? But I guess that’s the default order out-of-the-box. Whatever the reason, the default order out-of-the-box is completely dysfunctional if you want to use the pipeline feature.

Thank you everyone for all your help! I now have the Pipeline Processor after the Message Filter Chain and things are working well.

1 Like

Yay! Finally got there… thanks @gsmith! Interesting that pipelines are apparently useless unless you actively swap order with the Message Filter Chain… Seems like a bug to me…

2 Likes

Awesome, I wasn’t 100% sure.

EDIT: Looks like @jan made mention of this awhile back but judging from this post it looks like the default config needs to be reconfigured which most people are unaware.

To close the loop, here’s what I found with the Bit_Bucket rule.
When I enable the Input for ASA AnyConnect source without the pipeline in place it does the following in this order:

  1. Slams all four CPUs
  2. Output buffer on the node gets max’ed out.
  3. Process buffer then get’s max’ed out.
  4. Disk Journal starts to fill.
  5. Bunch of disk space for Elasticsearch gets chewed up.
  6. The system very soon collapses and stops processing things correctly for any stream.

Because inputs start automatically (I have put in a feature request to stop this) this is a problem. Here’s the interesting fix:

  1. Create a stream that captures that one input and removes it from the All Messages stream.
  2. Create a pipeline with the bit_bucket rule. Attach the pipeline to the stream.
  3. Make sure you have the tweak that makes the Message Filter Chain act before the Pipeline Processor.

Now the results on the system are:

  1. The CPUs goes up about 50 percentage points. They are not maxed out. So, it takes some CPU to throw out the message, but not as much.
  2. Output buffer, Process buffer and Disk Journal are fine and don’t fill.
  3. Disk space for Elasticsearch does not get used.

Thanks again!

1 Like