Trouble processing Fortimail Logs

Hello dear Graylog-Commmunity,

I’ve already read a lot of very helpful things on these forums, but I haven’t found anything that works for this particular issue. Thanks for all your help so far. :slight_smile:

1. Describe your incident:
I struggle with processing Logs I receive from our Fortimail-Servers.
At first I created a TCP Syslog input, since that’s the format Fortimail sends. That didn’t work as my Indices got flooded with useless fields. The fields were names of people e.g. “smith” or parts of URLs.

Now I’ve created a TCP raw/plaintext Input to circumvent the generation of those fields. That works fine, however, now the Logs aren’t received as individual messages anymore, but there are several messages, that would have been seperate with the Syslog input all in one very long string.

I couldn’t figure out how to split up that huge string into individual messages so I can extract the data I want with my pipeline. Or alternatively figure out how to not get flooded with automatically generated useless fields.

I have tried different solutions, like the one in this thread https://community.graylog.org/t/fortimail-msg-field-pipeline-fix/20631 but that didn’t work, since I have multiple “messages” inside of one message.

Raw message when receiving on the raw/plaintext input looks like this:

937 <22>date=2025-04-24 time=14:10:40.785 device_id=FEVM02TM23000692 eventtime=1745496640 tz="+0200" log_id=0200004538 type=statistics pri=information session_id="53OCAeWi004537-53OCAeWk004537" client_name="URL" client_ip="IP" client_cc="DE" dst_ip="IP" from="FROM" hfrom="HFROM" to="RECIPIENT" polid="3:1:9:SYSTEM" domain="DOMAIN" mailer="mta" resolved="OK" src_type="ext" direction="in" virus="" disposition="Modify Subject" classifier="Data Loss Prevention" message_length="41648" subject="SUBJECT" message_id="MESSAGE_ID" recv_time="" notif_delay="0" scan_time="0.075582" xfer_time="0.033849" srcfolder="" read_status=""934 <22>date=2025-04-24 time=14:10:40.934 device_id=FEVM02TM23000692 eventtime=1745496640 tz="+0200" log_id=0200004543 type=statistics pri=information session_id="53OCAeMS004542-53OCAeMU004542" client_name="URL" client_ip="IP" client_cc="DE" dst_ip="IP" from="FROM" hfrom="HFROM" to="RECIPIENT" polid="3:1:9:SYSTEM" domain="DOMAIN" mailer="mta" resolved="OK" src_type="ext" direction="in" virus="" disposition="Modify Subject" classifier="Data Loss Prevention" message_length="41631" subject="SUBJECT" message_id="MESSAGE_ID" recv_time="" notif_delay="0" scan_time="0.077121" xfer_time="0.020375" srcfolder="" read_status=""826 <22>date=2025-04-24 time=14:10:41.342 device_id=FEVM02TM23000692 eventtime=1745496641 tz="+0200" log_id=0200004546 type=statistics pri=information session_id="53OCA8Od004481-53OCA8Of004481" client_name="URL" client_ip="IP" client_cc="DE" dst_ip="IP" from="FROM" hfrom="HFROM" to="RECIPIENT" polid="3:1:9:SYSTEM" domain="DOMAIN" mailer="mta" resolved="OK" src_type="ext" direction="in" virus="" disposition="Modify Subject;Defer Disposition" classifier="Data Loss Prevention" message_length="108008" subject="SUBJECT" message_id="MESSAGE_ID" recv_time="" notif_delay="0" scan_time="0.195089" xfer_time="0.006664" srcfolder="" read_status=""

Large parts omitted because of length, this goes on for about 4-5 times this much text.

Each actual message starts with three digits and ends with ‘read_status=“”’

A regex along the lines of this can detect double occurences of messages if the regex is set to not greedy:

[0-9]{3} <..>.*read_status=\"\"[0-9]{3} <..>.*read_status=\"\"

Or halved to detect a single message:

[0-9]{3} <..>.*read_status=\"\"

2. Describe your environment:

  • OS Information: Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-58-generic x86_64)

  • Package Version: Graylog 6.1.10+a308be3 on myserver (Eclipse Adoptium 17.0.14 on Linux 6.8.0-57-generic)

  • Service logs, configurations, and environment variables: Not sure if I can provide something useful here.

3. What steps have you already taken to try and solve the problem?

Tried various solutions I found on Google or these forums. Tried building Extractors, tried building pipeline rules.

4. How can the community help?

Please help me figure out how to best process this data.
Do I use the raw input and split the messages? If so, how do I split them and then process them afterwards?
Or do I use the Syslog input? If so, how do I prevent the useless fields to be generated?

Hope I could describe the issue in a useful manner.
Thanks a lot for your help in the many other topics so far and have a great day.

You have a few options. Fortimail looks to support output in CSV format. You could then use a raw input and attach a CSV format pipeline rule.

The other option which I’ve tried is to use Fortianalyzer to forward Fortimail logs to Graylog in CEF format.

Hi,
thank you for your reply. And sorry for the late reply.

My collegue told me we don’t have Fortianalyzer, so I couldn’t test that.

I did try the approach with the CSV format and raw input. Unfortunately I still run into the issue, that when I use a raw input I get multiple sets of data in one “pile” that graylog treats as a singular message. So when I try processing it with a pipeline I only get 1 piece of data (So 1 date, 1 time,1 device_ID etc.) from the pile of 5-15 ish sets of data in that one message. It’s better than nothing but I’m still losing most of the data that way.

How do I split up this “pile” of data into the individual messages?

I am able to “split” the messages with a pipeline rule, however I only ever get the first message and everything else seemingly just disappears (It’s not in the stream I got it from and also not in the Default Stream), thus my question about splitting.

The rules quick reference for create_message in the pipeline rule editor says:
“Creates a new message which will be evaluated by the entire processing pipeline. Any omitted parameters (message, source, timestamp) will inherit their values from the currently processed message. The timestamp will inherit the current timestamp.”

Crutially “will be evaluated by the entire processing pipeline”, which doesnt seem to actually happen. My current rule looks like this:

Blockquote
rule “Splitting Fortimail Messages”
when
//Prüft ob 2 oder mehr Nachrichten in der selben “message” vorliegen.
regex(
“[1-9][0-9][0-9]\s<..>.*read_status=""[1-9][0-9][0-9]\s<..>.*read_status=""”,
to_string($message.message)
).matches == true
then
//Erzeugt einen Array mit 2 Elementen. Element [0] ist die erste Nachricht aus der Kette. Element [1] enthält alle weiteren Nachrichten in unveränderter Form
let splitmsg = split(“read_status=""”, to_string($message.message), 2);
//Fügt dem Element [0] wieder den Teil “read_status=”“” hinzu, der beim Splitting abgeschnitten wird. Das “Wegschneiden” geschieht nur ein mal.
let concatmsg = concat(to_string(splitmsg[0]), “read_status=""”);
//Ersetzt das “message” Feld mit der neuen, vereinzelten Nachricht.
set_field(“message”, to_string(concatmsg));
//Erzeugt eine neue message, die im Feld “message” die restlichen, nicht vereinzelten Nachrichten enthält. Es wird der selbe timestamp übernommen und die neue message durchläuft die Pipeline von Neuem.
create_message(to_string(splitmsg[1]), to_string($message.source), to_date($message.timestamp));
end

So I would imagine this rule splits off the first “message” (which it does) and creates a new message containing all of the remaining “messages” in the message field, then that new message goes back to the very start of the pipeline again, getting split again until it only contains a singular “message”.

Am I missing something? Are messages created with create_message not put back through the pipeline, and the documentation is wrong? Is there a way to put the new message back into the pipeline?