Extractor causes low out message performance

1. Describe your incident:
If we create an extractor on one of our inputs our out log performance goes down with something like 99%. Right now our in and out is around 5000 to 12000 messages. If i create this extrator our out goes to 5-10 messages and our processbuffer goes to 100%. The uncomitted messages builds up, and if i delete the extrator the out goes from the 5-10 to 50k and the build up is gone in seconds.

What could cause this?

This is the grok pattern we use:
SrcIP: %{IPV4:ciscoftd_src_ip}, DstIP: %{IPV4:ciscoftd_dst_ip}, SrcPort: %{INT:ciscoftd_src_port}, DstPort: %{INT:ciscoftd_dst_port}

I have also just tried with SrcIP: %{IPV4:ciscoftd_src_ip} but its the same.

2. Describe your environment:

  • OS Information:
    Verson: 22.04.3 LTS
    CPU: 12 vCPU
    RAM: 32GB

  • Package Version: 5.2.0

  • Service logs, configurations, and environment variables:

/etc/security/limits.d/elasticsearch.conf
-Xms14g
-Xmx14g

/etc/graylog/server/server.conf
processbuffer_processors = 8
outputbuffer_processors = 4
inputbuffer_ring_size = 65536
inputbuffer_processors = 4

3. What steps have you already taken to try and solve the problem?
I have tried the following:

  • Restarting
  • Increasing processors (We have tried a range og options, like 8, 10, 12, 20 for each or some but its the same)
  • Increasing RAM and CPU on the VM
  • Disabling Geo and plugins
  • Disabling all pipelines

Greetings! You are correct in that extractors can sometime incur severe performance penalties. Extractors are a legacy feature that predates Processing Pipelines.

I recommend exploring the use of Processing Pipeines and using those in place of extractors. This blog post is a great place to get started: https://graylog.org/post/graylog-parsing-rules-and-ai-oh-my/ . Let us know if you have any specific questions.

1 Like

I made the switch to pipelines earlier in the year - it wasn’t that hard and performance is so much better, plus you can do all the enrichment right there at the same time.

Thanks you two, i tried the pipelines and they work great. However now I see new fields like the following. Are these just from the pipeline?

MINUTE
MONTHDAY
MONTHNUM
SECOND
YEAR
ISO8601_TIMEZONE
IPV4
IPORHOST

those look like unnamed fields captured from a grok - just as a guess.

Did you configure the “Condition” ?
Because if you select “Always try to extract” it will try to apply the grok pattern on all logs received by the input.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.