What's the process for filing certain messages into a different stream?

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:
I can’t easily graph IP addressess in my nginx and apache logs because filebeat doesn’t parse any of the message content. I want to direct some logs into different streams so I can create fields to filter/graph on but not sure what the correct way is to do it.

2. Describe your environment:

  • OS Information:
    Linux Ubuntu 18.04

  • Package Version:
    4.2.5+59802bf

  • Service logs, configurations, and environment variables:
    I have filebeat installed on all my hosts which send logs to the Beats local input “Filebeats” on my graylog host. All these messages are in the All Messages stream

3. What steps have you already taken to try and solve the problem?
Read up on Pipelines, Extractors, Inputs, Stages and Rules. Some posts I ran across said not to go with Pipelines due to deprecation. This is how I set up my rules many years ago but not sure what is the correct way now.

4. How can the community help?
Where should I look to find updated information to organize my logs so i can easily graph or search IP addresses or Apache request content.The regex pipeline rules were super hard to debug so wondering if this process has gotten easier or do I need to go the marketplace instead and download an Apache Extractor and create an input somehow?

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

I had thought extractors were going to be depreciated (@gsmith) but it turns out neither as far as I am aware (not a Graylog employee) Since you are familiar with pipelines and rule, likely best to continue there.

You can use https://regex101.com/ to help with building regex (it isn’t exact to Graylog regex but close enough) and use http://grokdebug.herokuapp.com/ for GROK if you are so inclined to go that way.

You can use the debug() function in your pipeline to help figure out what is going on in there. If you post up specific questions and your pipeline rules, I am sure one of us would check it out with you…

Pipelines and Extractors are never as fast as having the data coming in already be parsed.

Perhaps a filebeats module would work for you

Apache Module:

NGINX Module:

Just another option that might be available to you.

1 Like

Thanks for the link! Given this Nginx log entry: 192.168.1.1 - - [20/Jan/2022:11:39:54 -0600] "GET /search/script.php/?page_number_9=1 HTTP/1.1" 200 718196 "-" "curl/7.58.0"

I can capture the IP, timestamp, request method, request, version, result code, size and agent with this mess:
^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s-\s-\s\[(\d{1,2}/[a-zA-Z]{3}/\d{4}:\d{2}:\d{2}:\d{2}).*]\s\"([A-Z]+)\s(.*)(HTTP/\d\.\d)\"\s(\d{3})\s(\d{1,8})\s\"-\"\s\"(.*)\"$

All the “\” characters need to be doubled in the rule, correct? I always thought this was a bit wonky but seemed to be the case.

Is there a better way to write the pipleline rule below? Also wondering if the when block is correct in regards to escaping the leading double quote on the request method?

rule "Extract Nginx Log"
when
  contains (to_string($message.message), "\"GET ") OR
  contains (to_string($message.message), "\"PUT ") OR
  contains (to_string($message.message), "\"OPTIONS ") OR
  contains (to_string($message.message), "\"POST ") OR
  contains (to_string($message.message), "\"DELETE ")
 
then
  let capture = regex ("^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s-\\s-\\s\\[(\\d{1,2}/[a-zA-Z]{3}/\\d{4}:\\d{2}:\\d{2}:\\d{2}).*]\\s\\"([A-Z]+)\\s(.*)(HTTP/\\d\\.\\d)\\"\\s(\\d{3})\\s(\\d{1,8})\\s\\"-\\"\\s\\"(.*)\\"$", to_string($message.message));

  // 192.168.1.1 - - [20/Jan/2022:11:39:54 -0600] "GET /search/script.php/?page_number_9=1 HTTP/1.1" 200 718196 "-" "curl/7.58.0"
  //group1	192.168.1.1
  //group2	20/Jan/2022:11:39:54
  //group3	GET
  //group4	/search/script.php/?page_number_9=1 
  //group5	HTTP/1.1
  //group6	200
  //group7	48
  //group8	curl/7.58.0

  set_field("source_ip",        capture["0"]);
  set_field("timestamp",        capture["1"]);
  set_field("request_method",   capture["2"]);
  set_field("request",          capture["3"]);
  set_field("protocol_version", capture["4"]);
  set_field("response_code",    capture["5"]);
  set_field("request_size",     capture["6"]);
  set_field("http_agent",       capture["7"]);
  
end

Thanks - I’m grateful for the ability to split messages up into fields to customize things, but writing/debugging pipeline rules is a bit like eating gravel. I am using Filebeat on all the clients but have no idea what the capabilities are there. I’ll check out your links. Maybe there is an easier way to capture the groups I want with filebeat instead.

If it works, it’s good…always best to try and optimize for processing speed so any time you can narrow a search it will help processing - particularly with larger volume ingestion. I am not familiar with Nginix but I would end up parsing with GROK rather than strait regex… I found an interesting post that will get you the GROK part here… for the Graylog integration, having the extra escape trips up a LOT of people… the way to handle that with GROK is to set up the pattern in System->Grok Patterns that you plan to use, the extra escape is not required in there, then you can reference with a single GROK pattern in your pipeline rules… using set_fields() to dump the results into their constituent field names.

Here is an example rule where BAR_1_START is defined in more depth over in “system->GROK patterns” :

rule "bar-1-start"
when
    has_field("bar_event_1")                            &&
    contains(to_string($message.message),"bar session started ")
then

    let barLine = grok("%{BAR_1_START}",to_string($message.message), true); 
    set_fields(barLine);
    set_field(  field: "action",
                value: "we have started"
            );
end

The GROK in system->Grok patterns looks like this:

BAR_1_START _________ ^(?:.* - )%{DATA:action}%{IP:local_ip}[,]%{SPACE}%{WORD}%{SPACE}%{HOSTNAME:sending_hostname}

Thanks for the Grok links, and clearing some of this up for me. I’ll check it out. It looks a lot easier to work with than regex.

updated: just wanted to make a note for anyone else running into this, I needed to rearrange the order of the Message Processors before the custom fields in my filter rule showed up in the elastic search query

Also had to change the regex a little (I need to experiment more with Grok). Here’s the working one since I can’t update my original post:
let capture = regex ("^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).*\\[(\\d{1,2}/[a-zA-Z]{3}/\\d{4}:\\d{2}:\\d{2}:\\d{2}).*\\s\"([A-Z]+)\\s(.*)(HTTP/\\d\\.\\d)\"\\s(\\d{3})\\s(\\d{1,8})\\s.*\"(.*)\"", to_string($message.message));

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.