We’re in the process of adding our sources to Graylog and I’m trying to configure a pipeline to extract the fields from incoming messages from Exchange and IIS logs, which are stored in CSV format and fed through a filebeat collector.
I have the messages entering Graylog without issue, and I’ve configured tags on the logs so we distinguish types, but I’m not able to get the pipeline to extract the fields. Here’s the logic we’ve been using:
I’ve then got 2 pipeline rules, one to recognise a tag and the other to run the grok pattern (I tried having them in a single rule but that also didn’t work).
Stage 0 rule:
rule “Exch_W3SVC_Logs_Tags”
when
contains(to_string($message.fields_log_type), “Exch_IIS”, true)
then
end
Stage 1 rule:
rule “Exch_W3SVC_Logs_Extract”
when
true
then
let mess = to_string($message.message);
let parsed = grok(pattern: “%{EXCH_W3SVC1_LOG}”, value: mess, only_named_captures: true );
set_fields(parsed);
end
Thus far the rules don’t seem to be running against incoming message. Any thoughts would be appreciated.
One thing to note - if you have grok’ed in a field name with a space (or some odd char) in it, set_fields() will fail without logging why (I put in a change request for it)…this is where debug() was helpful! I have been setting up exchange and iis to use split() rather than GROK if possible, it isn’t as neat as GROK but it reads well - here is our iis:
Hmm, do you mean that using a space as a delimiter in a grok pattern won’t work, or that having a space in the field name is what causes the issue?
I’ve tested the grok pattern using “test with sample data” function which seemed to work fine. Here’s the pattern, for reference:
(?%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) %{DATA:Server_IP} %{DATA:Method} %{DATA:URI_Stem} %{DATA:URI_Query} %{DATA:Server_Port} %{DATA:Client_Username} %{DATA:Client_IP} %{DATA:Client_UserAgent} %{DATA:Referrer} %{DATA:HTTP_Status} %{DATA:Protocol_Substatus} %{DATA:Win32_Status} %{DATA:Time_Taken} %{DATA:X-Forwarder-For}$
We’ve temporarily setup input extractors, but at the moment we’ve configured multiple kinds of logs to go to the same input, so I’m wondering whether an input extractor is the better method or whether using pipelines would scale better?
There are a series of special characters that set_fields() will silently die on if they are in the field name that it’s trying to create. None of which are in your GROK. Extractors work fine and when I asked the preference in the forum, the answer was ambivalent. While I was troubleshooting a different issue I moved everything from extractors to pipelines. We have a small environment so it wasn’t an issue.
So ignore me and go with shoothub’s checking the processing order and using debug() to see what is happening in the pipeline in the server logs