How to extract several fields via a regex extractor?

1. Describe your incident:

I would like to write a regex extractor that will extract several fields from the message

3. What steps have you already taken to try and solve the problem?

Consider the following message that arrives though a RAW TCP input:

[07:42:15] [INF] [11] Emby.Server.Implementations.ScheduledTasks.TaskManager: Webhook Item Added Notifier Completed after 0 minute(s) and 0 seconds

I wrote the following regex to extract all relevant fields


This only extracts the first group. This is also hinted at by the description under the regex field:

I tried named and anonymous groups wit the same result (only one field, the regex above is an example of one named group and the rest anonymous)

4. How can the community help?

Is there a way to extract several fields at once? If not - do I have to create a separate extractor for each field? (I do not think so)

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

You can use Grok patterns to extract several fields at the same time.


The best way to parse multiple fields is to use pipeline rules. I prefer regex over GROK, but both are supported in pipelines, along with many other methods, such as key value pairs, json parsing, CEF, or GELF parsing, just to name a few.

Pipelines give you a lot more control over the messages, and they have the advantage that they aren’t going to be deprecated, so all your hard work will be preserved for the foreseeable future.

They can be a little intimidating at first, but IMO the power they offer more than offsets the extra effort to learn how to use them. There are several good posts on pipelines and an entire library of pipeline function available via github. (graylog-plugin-pipeline-processor/plugin/src/test/resources/org/graylog/plugins/pipelineprocessor at master · Graylog2/graylog-plugin-pipeline-processor · GitHub)

Here’s an example of a regex rule that would do something similar to what you describe.

DESCRIPTION: Parsing Rule for Cisco Meraki Security Gateway Device IDS-ALERTS events.

rule “Cisco Meraki Security Gateway Parser 01.03 - IDS ALERTS”

//<134>1 1377449842.514782056 MX84 ids-alerts signature=129:4:1 priority=3 timestamp=1377449842.512569 direction=ingress protocol=tcp/ip src=
//<134>1 1377448470.246576346 MX84 ids-alerts signature=119:15:1 priority=2 timestamp=1377448470.238064 direction=egress protocol=tcp/ip src=
//To use Sample message for simulations, copy all but the \ characters.



let result = regex((“^<\d+>.+?(\d+)\.(\d+) (\S+) (\S+) signature=(\d+:\d+:\d+) priority=(\d) timestamp=(.+) direction=(\w+) protocol=(.+) src=(\d+.\d+.\d+.\d+):(\d+)$”),to_string($message.message));
set_field(“flow_start_time”, result[“0”]);
set_field(“flow_stop_time”, result[“1”]);
set_field(“device”, result[“2”]);
set_field(“event_type”, result[“3”]);
set_field(“signature”, result[“4”]);
set_field(“priority”, result[“5”]);
set_field(“device_timestamp”, result[“6”]);
set_field(“direction”, result[“7”]);
set_field(“protocol”, result[“8”]);
set_field(“src_ip”, result[“9”]);
set_field(“src_port”, result[“10”]);

Hope this helps.


How to parse CEF in Pipelines I am only aware of the Input doing the magic.

I thought you could collect it on a raw input and apply a pipeline using parse_CEF function to parse the message.

But when I went to the docs to find the exact syntax, I could not find the function. I wonder if it’s been deprecated. Or maybe I just imagined it.

I will look into that.

1 Like


You referring to this?

That’s the one. Looks like we left it out of the docs. Thanks @gsmith

1 Like

Oh yes, it does - thank you. It actually brings my parsing closer to “code” so I feel more comfortable with this than with more magical solutions.
I will read on that as soon as my ISP kindly restores my fiber access :face_with_raised_eyebrow:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.