Syslog message not being parsed when using Graylog Sidecar with filebeat input

1. Describe your incident:
I have deployed graylog-sidecar onto multiple servers and configured a Beats input as well as a Filebeat configuration in Sidecars section of Graylog. This is all working fine in terms of ingesting the log data into Graylog. However, the actual syslog messages are not being parsed into fields.

Maybe I’ve made some basic mistake in configuring the Filebeat collector configuration or this method simply does not support parsing syslog, but having done quite a bit of reading both on this forum, in Graylog docs and on the web in general, it is unclear to me what I need to do to remedy this situation.

  • OS Information:
    Debian 10 / Linux 4.19.0-20-amd64

  • Package Version:
    Graylog 4.2.7+879e651

  • Filebeat collector configuration:

fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}

filebeat.inputs:
- type: filestream
  paths:
    - /var/log/syslog
  fields:
    syslog: true
- type: filestream
  paths:
    - /var/log/haproxy.log
  fields:
    haproxy: true
- type: filestream
  paths:
    - /var/log/apache2/access.log
  fields:
    apache: true
    apache_access: true
- type: filestream
  paths:
    - /var/log/apache2/error.log
  fields:
    apache: true
    apache_error: true
- type: filestream
  paths:
    - /var/log/glusterfs/glusterd.log
  fields:
    glusterfs: true
- type: filestream
  paths:
    - /var/log/mysql/error.log
  fields:
    mariadb: true
- type: filestream
  paths:
    - /var/log/mongodb/mongod.log
  fields:
    mongodb: true
- type: filestream
  paths:
    - /var/log/kibana/kibana.log
  fields:
    kibana: true
output.logstash:
   hosts: ["192.168.(redacted):5141"]
   ssl.verification_mode: full
path:
  data: /var/lib/graylog-sidecar/collectors/filebeat/data
  logs: /var/lib/graylog-sidecar/collectors/filebeat/log

3. What steps have you already taken to try and solve the problem?

A lot of googling :slight_smile: and searching Graylog docs and forum.

I’ve tried amending the filebeat collector config in various ways and looking into extractors (which as I understand should only be needed for non-standard log sources that cannot be parsed by the collector itself?)

4. How can the community help?

Clarifying any misunderstandings of mine and showing me the right way forward to solve this issue in the best-practice way.

Please post a sample syslog message. Since messages are being ingested, it sounds like there might be a problem with the message content.

Can this one be used (some potentially sensitive details redacted):

Jul 21 14:25:22 (hostname redacted) bash[26880]: ssh: connect to host 172.(redacted).178 port 2032: Connection timed out

Here’s another example, this time snipped directly from Graylog:

Here’s a pastebin link containing my graylog sidecar config, in case that might be of any use.

re: pastebin - you can post code directly to the forum and us the </> tool above to make code readable. I took the liberty of modifying the Collector Configuration in the original post with the </> so you can see what it looks like. We have a security block on pastbin here so I can’t see your sidecar.yml. Not sure I need to though.

It looks as though you are describing how some fields are extracted by default by Beats/Graylog beats inputs, I think there are some built in tools in there… but I wouldn’t expect that using filebeat to pull syslog… the expectation is that syslog will be pushed to a Graylog syslog input and that input would do some formatting.

For the most part you would work with extractors and/or the pipeline to pull out the fields you want, there is very little that the Collectors and Inputs actually do for you.

Okay, so I should instead configure a syslog input in the filebeat collector configuration?
If so, would you have an example config that I could tweak to my needs?

I did initially try setting up just a simple rsyslog input (meaning no graylog-sidecar/filebeat), but after having a bad experience with one or two servers crashing due to excessive writing to syslog after enabling rsyslog config locally, I went instead with Graylog Sidecar + Filebeat - which seems to be the log collection method recommended by Graylog as well:

“The most recommended way to pick a log file from Windows or Linux systems is filebeat. This collector is built to collect log files and ship them to a central location.”

I can’t say with certainty which is a better/more efficient method but it seems to me that using something built in to Linux (syslog/rsyslog) rather than something third party (beats) would logically be more efficient… that is with zero evidence to back my thoughts up. So… personally I would find/fix the crashing issues with rsyslog rather than try to have beats jump in over it… particularly since that is having an issue too. Sorry not a helpful solution to the specific issue… :frowning:

Okay, that’s a reasonable logic, however I still would prefer to stay with Graylog Sidecar / Beats since I have put a lot of time into getting that working in the first place :slight_smile:

In any case, thank you for the input.
Now I will wait and see if anyone else is going to chime in…

Maybe @gsmith has some advice! :wink:

@foss4ever

Correct me if I’m wrong, what I understand from this post is, You have multiple log files, from what I have noticed there different types of log files? FileBeat is sending them to a beats input (port 5044)?

I haven’t worked with FileBeat that much so I don’t know a lot of configuration I can make, BUT here is an example of my Beats. I have Windows & Linux sending logs to the same Beats INPUT, One is using FileBeat, other one is using Winlogbeat.

FileBeat Port 5044
NOTE: If you notice there are a lot of redundant fields created. This means an increase of disk space is being used.

Winlogbeat Port 5044 & my apologies this is very small, trying to get all fields in the screenshot.

As you can see I have the same out come as yourself.

The input can only do so much, this depends on the log message.

With that being said, here is another example I made for you. I’m using Raw/plaintext Input (i.e. I left the message whole so you can see what I did) and created my own fields needed.

I need to ask a couple questions.

  • What fields you want to see?
  • What are you trying to achieve?

Thanks for helping out again (you also were helpful with an SSL/TLS question of mine) :slight_smile:

I’ll go through your questions below (some potentially sensitive bits are left out):

Correct me if I’m wrong, what I understand from this post is, You have multiple log files, from what I have noticed there different types of log files? FileBeat is sending them to a beats input (port 5044)?

I have currently two Beats inputs - one for collecting all Debian OS and application logs (TCP port 5141), for which I shared the collector config in my initial post, and another one for collecting the Debian audit log (TCP port 5142) with a separate Filebeat collector config as well.

(…) here is another example I made for you. I’m using Raw/plaintext Input (i.e. I left the message whole so you can see what I did) and created my own fields needed.

Now, this is not a question you’re posing, but I wanted to address it still, since it sounds quite interesting to me if I could get rid of, or reduce all those default and rather useless “filebeat_” fields, such as: filebeat_agent_version, filebeat_ecs_version, etc.
Will that be possible while still using Beats input with filebeat collector, or only if creating a new raw/plaintext TCP/UDP input (assuming no Graylog Sidecar involved here?)

What fields you want to see?

This is a bit hard to explain in writing, but I’ll give it a try.

What log message fields I want to see will vary between types of logs, so would be quite different for instance between apache error log messages and nftables messages in the syslog. But it might also differ within the same log, for instance the messages being written to syslog are quite varied and I would probably want to extract different parts into fields for these couple of examples:

Jul 28 11:40:42 Proc01 php: Ods\Lib\Classes\EventPredictionCleardown::process calling provideForSubscriptions without a prediction…

Jul 28 11:40:43 MongoDB01 kernel: [4150099.261185] [nftables] Inbound traffic dropped: IN=ens192 OUT= MACSRC=00:0e MACDST=ff:ff:ff:ff:ff:ff MACPROTO=0800 SRC=10..250 DST=10..255 LEN=78 TOS=0x00 PREC=0x00 TTL=64 ID=15835 DF PROTO=UDP SPT=137 DPT=137 LEN=58

Jul 28 11:51:10 Proc01 bash[15652]: +++ ssh -p 2032 -i /etc/ssh/ssh_host_ed25519_key -o StrictHostKeyChecking=no -o ConnectTimeout=3 -o ConnectionAttempts=1 -l 172..187 ‘cat /opt//services/configuration-service/etc/config-id.json’

Jul 28 12:20:20 MongoDB02 mongod {“t”:{“$date”:“2022-07-28T12:20:20.534+02:00”},“s”:“I”, “c”:“STORAGE”, “id”:22430, “ctx”:“WTCheckpointThread”,“msg”:“WiredTiger message”,“attr”:{“message”:“[16700], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 2486270, snapshot max: 2486270 snapshot count: 0, oldest timestamp: (1659003613, 1) , meta checkpoint timestamp: (1659003618, 1) base write gen: 6403763”}}

Another example would be the Apache error log where I have a web application firewall, ModSecurity, writing messages when it sees suspicious requests or blocks something.
These messages have a bunch of information, where I would want to extract just some bits such as “client”, “msg”, “uri”, “severity”, “hostname” into separate fields - example message below:

[Thu Jul 28 12:37:06.698125 2022] [:error] [pid 29651] [client 77..119:0] [client 77..119] ModSecurity: Warning. Operator EQ matched 0 at REQUEST_HEADERS. [file “/etc/modsecurity/rules/REQUEST-920-PROTOCOL-ENFORCEMENT.conf”] [line “702”] [id “920340”] [msg “Request Containing Content, but Missing Content-Type header”] [severity “NOTICE”] [ver “OWASP_CRS/3.3.2”] [tag “application-multi”] [tag “language-multi”] [tag “platform-multi”] [tag “attack-protocol”] [tag “paranoia-level/1”] [tag “OWASP_CRS”] [tag “capec/1000/210/272”] [hostname “-admin..com”] [uri “/api/v1//file”] [unique_id “YuJm0pbVhI2YNefQValLfQAAAAY”]

On the other hand, there are all the other log entries in Apache2 error log that are not related to ModSecurity, for which there will be no such parts to extract into fields:

[Thu Jul 28 10:11:43.913293 2022] [php7:notice] [pid 11061] [client 18..222:0] SFactory getElementXML Unsupported version ‘’

[Thu Jul 28 11:11:40.994935 2022] [php7:notice] [pid 15724] [client 77..130:0] Array\n(\n)\n

But maybe this won’t really present a problem and will just result in Graylog doing a tiny bit of extra work crunching through those all apache error log lines where it could just focus on the ones having the term “ModSecurity” ?

What are you trying to achieve?

So, in essence, what I’m trying to achieve is to have graylog put the information I need from each type of log message into separate fields - ideally allowing me to define the name for each field as well.

I guess what would be really useful would be if there were some kind of template that one could use, which would be able to recognise various types of incoming log messages on a Beats input (say, apache access log messages vs. mysql error log messages, vs …), and then automatically create fields in Graylog only for the relevant/useful parts of the log message - assuming that there is a general consensus on which those might be.

At the bottom of your Beats Input, you can remove the prefix:
image


I was looking through the filebeat documentation and noted that the newer filestream type (vs. log) requires an Unique ID…


Each filestream input must have a unique ID. Omitting or changing the filestream ID may cause data duplication. Without a unique ID, filestream is unable to correctly track the state of files.

so

filebeat.inputs:
- type: filestream
  id: id-syslog
  paths:
    - /var/log/syslog
  fields:
    syslog: true
- type: filestream
  id: id-haproxy
  paths:
    - /var/log/haproxy.log
  fields:
    haproxy: true
...

Also - most of your paths need unique handling so they are separated, but the apache can be combined:

...
- type: filestream
  id: id-apache
  paths:
    - /var/log/apache2/access.log
    - /var/log/apache2/error.log
  fields:
    apache: true
    apache_access: true
...

Thanks for that, I’ve now turned the Beats type as prefix on the input as well as added id’s to the filebeat collector config.
I would combine the apache filestream for error and access logs, but don’t I then lose the ability to separate these two by fields?

yea, I must not have had my glasses on… forget that part…

Hello,

First of all thank you for clarifying what you want :+1:

I get it, multiple type of logs and you want to separate the different parts of the messages into fields , Also while keeping the amount of Disk spaced being used up from excessive amounts of fields being generated from FileBeat.

Here are a couple suggestion that I know of.

You can use FileBeat but as you can see, is what you get. Stated above, this real depends on the messages being shipped. You can create a pipeline and drop those fields that are not wanted BUT now you doing twice as much work (FileBeat, drop fields then add fields you wanted) you could have been using Syslog UDP input and making a couple extractors done.

Which brings me to alternative sources. You seen my post above and what I can do for RawPlaintext UDP. This also can be done with Syslog TCP Input if needed ( Keeping it Simple) and Yes… you can still use Graylog Sidecar :slight_smile:. This would mean you might need to use something else besides FileBeat.

The following Example, I’m using Graylog-Sidecar with Nxlog

Nxlog configuration This is simple but gets the job done. Notice I’m collecting the files I want then sending them out UDP( Also can be done GELF, TCP/TLS, etc…)

Nxlog configuration
[root@graylog journal]# cat /etc/nxlog.conf
########################################
# Global directives   #
########################################
User nxlog
Group nxlog

LogFile /var/log/nxlog/nxlog.log
LogLevel INFO

########################################
# Modules  ### This is for different file types.
########################################

<Extension _syslog>
    Module      xm_syslog
</Extension>

########################################
# INPUTs   ###  Location of  logs  needed
########################################

<Input  in>
    Module       im_file
    FILE         "/var/log/messages"
    SavePos       TRUE
    ReadFromLast  TRUE
    PollInterval  1
    #Exec  $Message = $raw_event;
 </Input>
<Input nxlog>
    Module       im_file
    FILE         "/var/log/nxlog/nxlog.log"
    SavePos       TRUE
    ReadFromLast  TRUE
    PollInterval  1
    #Exec  $Message = $raw_event;
 </Input>

<Input access>
    Module       im_file
    FILE         "/var/log/graylog-server/restaccess.log"
    SavePos       TRUE
    ReadFromLast  TRUE
    PollInterval  1
    #Exec  $Message = $raw_event;
 </Input>
<Input graylog>
    Module       im_file
    FILE         "/var/log/graylog-server/server.log"
    SavePos       TRUE
    ReadFromLast  TRUE
    PollInterval  1
    #Exec  $Message = $raw_event;
 </Input>
<Input audit>
    Module       im_file
    FILE         "/var/log/audit/audit.log"
    SavePos       TRUE
    ReadFromLast  TRUE
    PollInterval  1
    #Exec  $Message = $raw_event;
 </Input>
<Input secure>
    Module       im_file
    FILE         "/var/log/secure"
    SavePos       TRUE
    ReadFromLast  TRUE
    PollInterval  1
    #Exec  $Message = $raw_event;
 </Input>
########################################
# OUTPUTs   ###  Location of  logs  are going
########################################
<Output out>
    Module      om_udp
    Host        graylog.domain.com
    Port        51411      
    Exec $Hostname = hostname_fqdn();  
    Exec $ShortMessage = $raw_event;   
</Output>


########################################
# Routes                               #
########################################
<Route>
    Path       in => out
</Route>
<Route>
    Path       access => out
</Route>
<Route>
    Path       audit => out
</Route>
<Route>
    Path       secure => out
</Route>
<Route>
    Path       graylog  => out
</Route>
<Route>
    Path       nxlog=> out
</Route>

Results, Notice the field in the red box called SourceModuleName which matches one of my nxlog input configuration file :wink:

So this is what can be done from that field called SourceModuleName.
Widget displays all my logs files sources being sent.
Log files that are shown below " Graylog Access file, Apache File/s, Audit files, Zabbix files, etc…"

image

This is a plain canvas, Now all I have to do is create the fields needed either with pipeline, extractors, GROK patterns.

What I believe needs to happen is to find a common ground from all these different type of logs.
Once the logs arrive using the above example, you should have 4-5 fields, From there you can use 1 Pipeline /w Rule to create any and all fields you want, Done

“Simplicity boils down to two steps: Identify the essential. Eliminate the rest.”

Maybe someone else a better idea, I’m sorry I don’t have a direct answer for you but the question here is really how you want to go about doing this.

Ok, sounds like I might want to consider nxlog.

Would you be able to share with me the collector and client config needed for this (connections would need to be encrypted)?
Also, with nxlog, would I still be able to configure multiple log ingestions in the one and same nxlog collector?

By the way, I actually have been able to extract fields from audit log filebeat input, just by setting up a simple extractor based on a random message from the log input:

Now, I don’t fully comprehend how this extractor works‚ under the hood, but it does manage to grab each and every part of the log message and separates them into fields.
This isn’t ideal, since there are a lot of details recorded by auditd that I don’t need, but at least it works in basic terms (and without configuring any sort of pipelines).

So, really, if there was a way to set up the extractor to put into fields just the parts that I need, then I think that would achieve my goal.
Maybe I need to do some more reading on extractors…

Yes, here is a lab configuration.

nxlog_config
## This is a sample configuration file. See the nxlog reference manual about the
## configuration options. It should be installed locally under
## /usr/share/doc/nxlog-ce/ and is also available online at
## http://nxlog.org/docs

########################################
#Global directives         ############           
########################################

User nxlog
Group nxlog


########################################
#Log Directory             #############
########################################

LogFile /var/log/nxlog/nxlog.log
LogLevel INFO

########################################
#Modules                   ############           
########################################

<Extension _syslog>
    Module      xm_syslog
</Extension>

<Extension _gelf>
    Module      xm_gelf
</Extension>
<Input in>
    Module       im_file
    FILE         "/var/log/messages"
    SavePos       TRUE
    ReadFromLast  TRUE
    PollInterval  1
    #Exec  $Message = $raw_event;
 </Input>
<Input access>
    Module       im_file
    FILE         "/var/log/graylog-server/restaccess.log"
    SavePos       TRUE
    ReadFromLast  TRUE
    PollInterval  1
    #Exec  $Message = $raw_event;
 </Input>
<Input graylog>
    Module       im_file
    FILE         "/var/log/graylog-server/server.log"
    SavePos       TRUE
    ReadFromLast  TRUE
    PollInterval  1
    #Exec  $Message = $raw_event;
 </Input>
<Input audit>
    Module       im_file
    FILE         "/var/log/audit/audit.log"
    SavePos       TRUE
    ReadFromLast  TRUE
    PollInterval  1
    #Exec  $Message = $raw_event;
 </Input>
<Input secure>
    Module       im_file
    FILE         "/var/log/secure"
    SavePos       TRUE
    ReadFromLast  TRUE
    PollInterval  1
    #Exec  $Message = $raw_event;
 </Input>

<Output out>
    Module      om_ssl
    Host        graylog.domain.com
    Port        44444
    OutputType  GELF_TCP
    CertFile        /var/lib/nxlog/cert/graylog3-certificate.pem
    CertKeyFile     /var/lib/nxlog/cert/graylog3-key.pem
    CAFile          /var/lib/nxlog/cert/cert3.pem
    KeyPass         secret
    AllowUntrusted  false
    Exec $Hostname = hostname_fqdn();   
    Exec $ShortMessage = $raw_event;
    
</Output>



########################################
# Routes                               #
########################################
<Route>
    Path      in => out
</Route>
<Route>
    Path      access => out
</Route>
<Route>
    Path      audit => out
</Route>
<Route>
    Path      secure => out
</Route>
<Route>
    Path      graylog  => out
</Route>

Not only did I show that in my post above, but also in this post :+1:

I agree… check out the Graylog Documentation, Scan the forum on pipeline. Extractors, Lookup tables, etc…
With extractors you can do multiple configurations. Keep in mind & depending on how big this environment is, extractors may increase your resources.

Like extractors, I have an aversion to nxlog, I prefer beats and pipeline. @gsmith compliments my skills for now, but already he is picked up beats and pipeline… which will shorten my forum replies to: “yes…that.”

OK, I thought you were actually using rsyslog yourself, as per your earlier comments, but if you’re indeed using Beats/Filebeat just like myself then I’m curious why you haven’t faced similar issues as I have, with the log messages not being separated into fields…

I haven’t looked much at pipelines, but I did find this potentially useful resource:

Btw, am I right in assuming that I would not be using both pipelines/rules and extractors at the same time? A bit too many new concepts to grasp ^^