Thanks for helping out again (you also were helpful with an SSL/TLS question of mine)
I’ll go through your questions below (some potentially sensitive bits are left out):
Correct me if I’m wrong, what I understand from this post is, You have multiple log files, from what I have noticed there different types of log files? FileBeat is sending them to a beats input (port 5044)?
I have currently two Beats inputs - one for collecting all Debian OS and application logs (TCP port 5141), for which I shared the collector config in my initial post, and another one for collecting the Debian audit log (TCP port 5142) with a separate Filebeat collector config as well.
(…) here is another example I made for you. I’m using Raw/plaintext Input (i.e. I left the message whole so you can see what I did) and created my own fields needed.
Now, this is not a question you’re posing, but I wanted to address it still, since it sounds quite interesting to me if I could get rid of, or reduce all those default and rather useless “filebeat_” fields, such as: filebeat_agent_version, filebeat_ecs_version, etc.
Will that be possible while still using Beats input with filebeat collector, or only if creating a new raw/plaintext TCP/UDP input (assuming no Graylog Sidecar involved here?)
What fields you want to see?
This is a bit hard to explain in writing, but I’ll give it a try.
What log message fields I want to see will vary between types of logs, so would be quite different for instance between apache error log messages and nftables messages in the syslog. But it might also differ within the same log, for instance the messages being written to syslog are quite varied and I would probably want to extract different parts into fields for these couple of examples:
Jul 28 11:40:42 Proc01 php: Ods\Lib\Classes\EventPredictionCleardown::process calling provideForSubscriptions without a prediction…
Jul 28 11:40:43 MongoDB01 kernel: [4150099.261185] [nftables] Inbound traffic dropped: IN=ens192 OUT= MACSRC=00:0e MACDST=ff:ff:ff:ff:ff:ff MACPROTO=0800 SRC=10..250 DST=10..255 LEN=78 TOS=0x00 PREC=0x00 TTL=64 ID=15835 DF PROTO=UDP SPT=137 DPT=137 LEN=58
Jul 28 11:51:10 Proc01 bash[15652]: +++ ssh -p 2032 -i /etc/ssh/ssh_host_ed25519_key -o StrictHostKeyChecking=no -o ConnectTimeout=3 -o ConnectionAttempts=1 -l 172..187 ‘cat /opt//services/configuration-service/etc/config-id.json’
Jul 28 12:20:20 MongoDB02 mongod {“t”:{“$date”:“2022-07-28T12:20:20.534+02:00”},“s”:“I”, “c”:“STORAGE”, “id”:22430, “ctx”:“WTCheckpointThread”,“msg”:“WiredTiger message”,“attr”:{“message”:“[16700], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 2486270, snapshot max: 2486270 snapshot count: 0, oldest timestamp: (1659003613, 1) , meta checkpoint timestamp: (1659003618, 1) base write gen: 6403763”}}
Another example would be the Apache error log where I have a web application firewall, ModSecurity, writing messages when it sees suspicious requests or blocks something.
These messages have a bunch of information, where I would want to extract just some bits such as “client”, “msg”, “uri”, “severity”, “hostname” into separate fields - example message below:
[Thu Jul 28 12:37:06.698125 2022] [:error] [pid 29651] [client 77..119:0] [client 77..119] ModSecurity: Warning. Operator EQ matched 0 at REQUEST_HEADERS. [file “/etc/modsecurity/rules/REQUEST-920-PROTOCOL-ENFORCEMENT.conf”] [line “702”] [id “920340”] [msg “Request Containing Content, but Missing Content-Type header”] [severity “NOTICE”] [ver “OWASP_CRS/3.3.2”] [tag “application-multi”] [tag “language-multi”] [tag “platform-multi”] [tag “attack-protocol”] [tag “paranoia-level/1”] [tag “OWASP_CRS”] [tag “capec/1000/210/272”] [hostname “-admin..com”] [uri “/api/v1//file”] [unique_id “YuJm0pbVhI2YNefQValLfQAAAAY”]
On the other hand, there are all the other log entries in Apache2 error log that are not related to ModSecurity, for which there will be no such parts to extract into fields:
[Thu Jul 28 10:11:43.913293 2022] [php7:notice] [pid 11061] [client 18..222:0] SFactory getElementXML Unsupported version ‘’
[Thu Jul 28 11:11:40.994935 2022] [php7:notice] [pid 15724] [client 77..130:0] Array\n(\n)\n
But maybe this won’t really present a problem and will just result in Graylog doing a tiny bit of extra work crunching through those all apache error log lines where it could just focus on the ones having the term “ModSecurity” ?
What are you trying to achieve?
So, in essence, what I’m trying to achieve is to have graylog put the information I need from each type of log message into separate fields - ideally allowing me to define the name for each field as well.
I guess what would be really useful would be if there were some kind of template that one could use, which would be able to recognise various types of incoming log messages on a Beats input (say, apache access log messages vs. mysql error log messages, vs …), and then automatically create fields in Graylog only for the relevant/useful parts of the log message - assuming that there is a general consensus on which those might be.