Extractor causes low out message performance

1. Describe your incident:
If we create an extractor on one of our inputs our out log performance goes down with something like 99%. Right now our in and out is around 5000 to 12000 messages. If i create this extrator our out goes to 5-10 messages and our processbuffer goes to 100%. The uncomitted messages builds up, and if i delete the extrator the out goes from the 5-10 to 50k and the build up is gone in seconds.

What could cause this?

This is the grok pattern we use:
SrcIP: %{IPV4:ciscoftd_src_ip}, DstIP: %{IPV4:ciscoftd_dst_ip}, SrcPort: %{INT:ciscoftd_src_port}, DstPort: %{INT:ciscoftd_dst_port}

I have also just tried with SrcIP: %{IPV4:ciscoftd_src_ip} but its the same.

2. Describe your environment:

  • OS Information:
    Verson: 22.04.3 LTS
    CPU: 12 vCPU
    RAM: 32GB

  • Package Version: 5.2.0

  • Service logs, configurations, and environment variables:

/etc/security/limits.d/elasticsearch.conf
-Xms14g
-Xmx14g

/etc/graylog/server/server.conf
processbuffer_processors = 8
outputbuffer_processors = 4
inputbuffer_ring_size = 65536
inputbuffer_processors = 4

3. What steps have you already taken to try and solve the problem?
I have tried the following:

  • Restarting
  • Increasing processors (We have tried a range og options, like 8, 10, 12, 20 for each or some but its the same)
  • Increasing RAM and CPU on the VM
  • Disabling Geo and plugins
  • Disabling all pipelines

Greetings! You are correct in that extractors can sometime incur severe performance penalties. Extractors are a legacy feature that predates Processing Pipelines.

I recommend exploring the use of Processing Pipeines and using those in place of extractors. This blog post is a great place to get started: https://graylog.org/post/graylog-parsing-rules-and-ai-oh-my/ . Let us know if you have any specific questions.

I made the switch to pipelines earlier in the year - it wasn’t that hard and performance is so much better, plus you can do all the enrichment right there at the same time.

Thanks you two, i tried the pipelines and they work great. However now I see new fields like the following. Are these just from the pipeline?

MINUTE
MONTHDAY
MONTHNUM
SECOND
YEAR
ISO8601_TIMEZONE
IPV4
IPORHOST

those look like unnamed fields captured from a grok - just as a guess.

Did you configure the “Condition” ?
Because if you select “Always try to extract” it will try to apply the grok pattern on all logs received by the input.