Hello dear Graylog-Commmunity,
I’ve already read a lot of very helpful things on these forums, but I haven’t found anything that works for this particular issue. Thanks for all your help so far.
1. Describe your incident:
I struggle with processing Logs I receive from our Fortimail-Servers.
At first I created a TCP Syslog input, since that’s the format Fortimail sends. That didn’t work as my Indices got flooded with useless fields. The fields were names of people e.g. “smith” or parts of URLs.
Now I’ve created a TCP raw/plaintext Input to circumvent the generation of those fields. That works fine, however, now the Logs aren’t received as individual messages anymore, but there are several messages, that would have been seperate with the Syslog input all in one very long string.
I couldn’t figure out how to split up that huge string into individual messages so I can extract the data I want with my pipeline. Or alternatively figure out how to not get flooded with automatically generated useless fields.
I have tried different solutions, like the one in this thread https://community.graylog.org/t/fortimail-msg-field-pipeline-fix/20631 but that didn’t work, since I have multiple “messages” inside of one message.
Raw message when receiving on the raw/plaintext input looks like this:
937 <22>date=2025-04-24 time=14:10:40.785 device_id=FEVM02TM23000692 eventtime=1745496640 tz="+0200" log_id=0200004538 type=statistics pri=information session_id="53OCAeWi004537-53OCAeWk004537" client_name="URL" client_ip="IP" client_cc="DE" dst_ip="IP" from="FROM" hfrom="HFROM" to="RECIPIENT" polid="3:1:9:SYSTEM" domain="DOMAIN" mailer="mta" resolved="OK" src_type="ext" direction="in" virus="" disposition="Modify Subject" classifier="Data Loss Prevention" message_length="41648" subject="SUBJECT" message_id="MESSAGE_ID" recv_time="" notif_delay="0" scan_time="0.075582" xfer_time="0.033849" srcfolder="" read_status=""934 <22>date=2025-04-24 time=14:10:40.934 device_id=FEVM02TM23000692 eventtime=1745496640 tz="+0200" log_id=0200004543 type=statistics pri=information session_id="53OCAeMS004542-53OCAeMU004542" client_name="URL" client_ip="IP" client_cc="DE" dst_ip="IP" from="FROM" hfrom="HFROM" to="RECIPIENT" polid="3:1:9:SYSTEM" domain="DOMAIN" mailer="mta" resolved="OK" src_type="ext" direction="in" virus="" disposition="Modify Subject" classifier="Data Loss Prevention" message_length="41631" subject="SUBJECT" message_id="MESSAGE_ID" recv_time="" notif_delay="0" scan_time="0.077121" xfer_time="0.020375" srcfolder="" read_status=""826 <22>date=2025-04-24 time=14:10:41.342 device_id=FEVM02TM23000692 eventtime=1745496641 tz="+0200" log_id=0200004546 type=statistics pri=information session_id="53OCA8Od004481-53OCA8Of004481" client_name="URL" client_ip="IP" client_cc="DE" dst_ip="IP" from="FROM" hfrom="HFROM" to="RECIPIENT" polid="3:1:9:SYSTEM" domain="DOMAIN" mailer="mta" resolved="OK" src_type="ext" direction="in" virus="" disposition="Modify Subject;Defer Disposition" classifier="Data Loss Prevention" message_length="108008" subject="SUBJECT" message_id="MESSAGE_ID" recv_time="" notif_delay="0" scan_time="0.195089" xfer_time="0.006664" srcfolder="" read_status=""
Large parts omitted because of length, this goes on for about 4-5 times this much text.
Each actual message starts with three digits and ends with ‘read_status=“”’
A regex along the lines of this can detect double occurences of messages if the regex is set to not greedy:
[0-9]{3} <..>.*read_status=\"\"[0-9]{3} <..>.*read_status=\"\"
Or halved to detect a single message:
[0-9]{3} <..>.*read_status=\"\"
2. Describe your environment:
-
OS Information: Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-58-generic x86_64)
-
Package Version: Graylog 6.1.10+a308be3 on myserver (Eclipse Adoptium 17.0.14 on Linux 6.8.0-57-generic)
-
Service logs, configurations, and environment variables: Not sure if I can provide something useful here.
3. What steps have you already taken to try and solve the problem?
Tried various solutions I found on Google or these forums. Tried building Extractors, tried building pipeline rules.
4. How can the community help?
Please help me figure out how to best process this data.
Do I use the raw input and split the messages? If so, how do I split them and then process them afterwards?
Or do I use the Syslog input? If so, how do I prevent the useless fields to be generated?
Hope I could describe the issue in a useful manner.
Thanks a lot for your help in the many other topics so far and have a great day.