Why is this extractor using so much CPU time?

FYI, I’m new to Graylog.

I need to parse PFSense logs. The filterlog are comma delimitted. Rather than writing a rule for each field, I thought a CSV converter would do. The form requires me to extract data, so I just had it extract all to a new field. After applying, CPU usage shoots up. If I apply 2 more extractors like these, my process buffer fills and all data stops ingesting. I determined this by deleting all extractors, and applying them back one by one.

Is there something wrong with my extractor, or is it just that resource intensive? I have 2 cores (FX-8350) and 6 GB RAM allocated. Normally, CPU usage is under 10%. Graylog is 4.0.8 on Ubuntu 18.04.

Here’s the JSON, condition type set to Only attempt extraction if field matches regular expression:

{

    "extractors": [

        {

            "title": "PFSense: Filterlog TCP",

            "extractor_type": "regex",

            "converters": [

              {

                "type": "csv",

                "config": {

                  "column_header": "rule_number,sub_rule_number,anchor,tracker,if,reason,action,direction,ip_version,tos,ecn,ttl,id,offset,flags,protocol_id,protocol,length,source_ip,destination_ip,source_port,destination_port,data_length,tcp_flags,tcp_seq_num,tcp_ack,tcp_window,tcp_urg,tcp_opts",

                  "trim_leading_whitespace": true

                }

              }

            ],

            "order": 0,

            "cursor_strategy": "copy",

            "source_field": "message",

            "target_field": "pfsense_filterlog_tcp",

            "extractor_config": {

              "regex_value": "(.+)"

            },

            "condition_type": "regex",

            "condition_value": "(?i)^filterlog\\[\\d+]:\\s(?:(?:.+?),){14}tcp.+$"

          }

    ],

    "version": "4.0.8"

  }

I simplified condition_value to (?i)^filterlog.*tcp,.+$ and CPU usage is normal.

Hello,

I assume you fix your issue.

Yes they can be resource intensive, for example if I put an extractor on an INPUT that is collecting 30 GB log a day my CPU goes up, and also how you configure you extractor. In the past I had to increase my CPU from 4 to 12 cores as we were adding clients almost everyday.

Using a virtual machine made it easier to add resources then to add more hardware to a server.

I did.

I’m ingesting 1 to 2 GB a day. A firewall, and a few servers.

The best way to see is probably using HTOP or TOP find out why. To be honest, Elasticsearch is a resource hog.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.