FYI, I’m new to Graylog.
I need to parse PFSense logs. The filterlog are comma delimitted. Rather than writing a rule for each field, I thought a CSV converter would do. The form requires me to extract data, so I just had it extract all to a new field. After applying, CPU usage shoots up. If I apply 2 more extractors like these, my process buffer fills and all data stops ingesting. I determined this by deleting all extractors, and applying them back one by one.
Is there something wrong with my extractor, or is it just that resource intensive? I have 2 cores (FX-8350) and 6 GB RAM allocated. Normally, CPU usage is under 10%. Graylog is 4.0.8 on Ubuntu 18.04.
Here’s the JSON, condition type set to Only attempt extraction if field matches regular expression:
{
"extractors": [
{
"title": "PFSense: Filterlog TCP",
"extractor_type": "regex",
"converters": [
{
"type": "csv",
"config": {
"column_header": "rule_number,sub_rule_number,anchor,tracker,if,reason,action,direction,ip_version,tos,ecn,ttl,id,offset,flags,protocol_id,protocol,length,source_ip,destination_ip,source_port,destination_port,data_length,tcp_flags,tcp_seq_num,tcp_ack,tcp_window,tcp_urg,tcp_opts",
"trim_leading_whitespace": true
}
}
],
"order": 0,
"cursor_strategy": "copy",
"source_field": "message",
"target_field": "pfsense_filterlog_tcp",
"extractor_config": {
"regex_value": "(.+)"
},
"condition_type": "regex",
"condition_value": "(?i)^filterlog\\[\\d+]:\\s(?:(?:.+?),){14}tcp.+$"
}
],
"version": "4.0.8"
}