Group unknown/suspicious logs into one stream using rules or pipelines

Hello,

I’m new to Graylog and I’m trying to better understand how the software works to evaluate the performance of the open source version for my use case. My goal is to compare its functionalities against other open source and also paid softwares that are meant to receiving and analyzing logs, as well as sending alerts and reports.

The project I’m working on requires the reception of Syslog UDP logs from a variety of sources, such as linux servers and network equipment, to later classify them into two groups:

  1. the ones which we already know the format and meaning and thus don’t need to be manually analyzed;
  2. the ones that are unknown for the system administrators and should be gathered in a group to be daily analyzed by the administrators. These are the potentially suspicious logs that may or may not reflect server malfunctions, attacks, etc.

In order to achieve the second grouping, our idea is to apply to the logs some kind of “negative filter”, in a way that the logs that match the well-known syntax conditions for the group 1 do not show up in the group 2. What I’m looking for is the best way to apply this sort of negative filter.

With my basic knowledge of the Graylog features, my first approach was to try doing this by creating different Streams with their own rules, and using the option “Remove matches from ‘All messages’ stream” to get, at the end, only the unknown messages at the generic ‘All messages’ stream. For exemple, I would create a stream exclusive for my mail servers and apply stream rules such as:

  • field source must match exactly the IP of the mail server 1 OR the mail server 2…
    AND
  • field message must mach the regular expression 1 OR the regular expression 2…

So I would have a stream with all the expected logs from my mail servers, and the same could be done with the other types of servers and network equipment. As all the messages redirected to these streams would be removed from the main stream, the “All Messages” stream would only have the Group 2 logs, the ones that are unknown and that need to be manually analysed every day.

The problem with this approach is: Graylog doesn’t seem to support complex boolean conditions or groups of conditions in the Stream Rules feature. I can only say that a message must match all the rules or at least one of the rules, and thus it’s not possible to redirect to one stream the logs that come from a given list of IP’s and that match a given list of regular expression rules. Please correct me if I’m wrong.

So I started digging a bit further to find other features in Graylog that could help me with that “two stage” filter. And I found the Pipelines. So I had the following idea:

The message enters Graylog through the Syslog UDP Input. Then it’s captured by a pipeline. This pipeline would distribute the messages to their respective streams according to the source of the log. And then, in each stream, I could configure a set of rules using the option “A message must match at least one of the following rules”, to finally have all the well-known messages from a specific kind of service inside one stream.

So at the end, my main question is: would this idea be possible in Graylog? Is this the best way of achieving my goal using the software? I’m always open to suggestions and also tips on how to program pipelines in this context!

I’d like to thank you in advance,

Felipe Silveira

So at the end, my main question is: would this idea be possible in Graylog? Is this the best way of achieving my goal using the software? I’m always open to suggestions and also tips on how to program pipelines in this context!

I would go this way.

  1. Filter logs into stream - by stream rule or pipeline did not make a difference, but pipeline can do more complex rules.
  2. Run processing pipeline on each stream that adds a field known_log:true if that is a known log.

You have all logs grouped that you find useful. By adjusting the search you can see if that is known or not. Means you can express NOT _exists_:known_log and get all unknown logs or all logs when you leave that out.
You could change that also the other way around. Tag all logs at first with “unknown” field and remove that later on with other rules. Whatever fits better to your local logic.

That might not be the perfect solution but something I would do.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.