Best practice in ingesting logs

We’ve started moving our logs, mostly syslog to graylog. It’s working great, and we want to take it further and add more structure to our log data.

Most of it is syslog, where we’re just shipping everything to graylog until we get a sense of what we can and need to store.It all goes to a single syslog input and to the default stream.

Apache logs are sent by filebeat to a single beats input. Then we have a netflow input for netflow data from routers and a GELF UDP input for our in-house java application.

Now, let’s say I want to create a dashboard for mail servers. Logs from our postfix mail servers is coming into the syslog input today and sent to the default stream. Where would it be best to extract data from postfix logs? On the host before sending to graylog, in an extractor (apparently being deprecated in favor of pipelines) or in a pipeline? If the answer is pipeline, should I first direct the messages to a dedicated stream and then run the pipeline there?

I know there are many ways to skin this cat, but I want to learn from others about faults and benefits of different ways.

1 Like

Thank you, Einar, for your interesting post! It’s a great candidate for our Daily Challenge.

Hey, Grayloggers, Can you help Einar answer these challenges?

I want to create a dashboard for mail servers. Logs from our postfix mail servers is coming into the syslog input today and sent to the default stream.

  • Where would it be best to extract data from postfix logs?

  • On the host before sending to graylog, in an extractor (apparently being deprecated in favor of pipelines) or in a pipeline?

  • If the answer is pipeline, should I first direct the messages to a dedicated stream and then run the pipeline there?

1 Like

I’ve created a pipeline based on GitHub - whyscream/postfix-grok-patterns: Logstash configuration and grok patterns for parsing postfix logging that extracts data and directs the messages to a dedicated postfix stream.

It works well, but I’m still interested in hearing about different solutions.

Hello,

The easiest way is if your device is sending logs to your INPUT GELF UDP then you can use your fields created and add/configure widgets to your to your dashboard. You can also use streams to filter down what you need even more from those fields created.

Example of my Windows dashboard using GELF/TCP (i.e. so long as you have the fields).

Below is an example of a user/s logon after hours/weekend Input GELF TCP/TLS
I create a stream to filter down what I needed. This way I dont have to run this pipeline in “All Messges”. It made things a lot simpler for us.

Stream

Extractor

Pipeline

rule "Between 6 AM and 6 PM"
when
	( to_long(to_date($message.timestamp, "American/Chicago").hourOfDay) >= 0 AND to_long(to_date($message.timestamp, "American/Chicago").hourOfDay) <= 6 ) OR
	( to_long(to_date($message.timestamp, "American/Chicago").hourOfDay) >= 18 AND to_long(to_date($message.timestamp, "American/Chicago").hourOfDay) <= 0 )
then
	set_field("trigger_workhours_off", true);
end
rule "Off Work Weekend"
when
	// from Monday (1) to Sunday (7)
	to_long(to_date($message.timestamp, "American/Chicago").dayOfWeek) == 7 OR
	to_long(to_date($message.timestamp, "American/Chicago").dayOfWeek) == 6
then
	set_field("trigger_workhours_off", true);
end
Rule "Route to stream"
when
    has_field("trigger_workhours_off")
then
    route_to_stream(id:"5d8acba383d72e04cba96317");
end

At the end of the pipeline I have it going to a stream Called " Windows: User Logged in After Work Hours". this is where I set my Event Definitions to this steam. Some pipeline are simpler and maybe a lot easier just to create a pipeline, then other maybe a little more complicated. Depending on what you want to do.

So, I look at resources it will use over time, and how simple can I make it.
Hope that helps.

3 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.