Graylog Exchange Message Tracking Log Extractor


(theresa) #1

Hi graylog community,

since doesn’t exist an extractor for the exchange message tracking log, I thought I’d have a go at it.
Please consider that I’m new to regex and it may be too compliacted or total nonsense.

An exchange message tracking log usually looks like this:

2016-04-02T16:06:58.552Z,1.2.3.4,client.fqdn.net,2.3.4.5,server-fqdn,08D31F74F20E83B8;2016-04-02T16:06:58.334Z;0,client-fqdn\Client Proxy client-fqdn,SMTP,RECEIVE,70132520976434,<b5ab16e8f5ee4eab8e89f68d147ad0f4@client.fqdn.net>,5478464c-b318-4880-66a6-08d35b10d1a3,,,1487,1,,,Client submission probe,HealthMailbox36b8315e56974a65af058f1f72987168@domain.com,HealthMailbox36b8315e56974a65af058f1f72987168@domain.com,00I: ,Originating,,127.0.0.1,1.2.3.4,S:FirstForestHop=client.fqdn.net;S:ProxiedClientIPAddress=127.0.0.1;S:ProxiedClientHostname=SmtpClientSubmissionProbe;S:DeliveryPriority=Normal;S:AccountForest=fqdn.net;S:IsProbe=true;S:PersistProbeTrace=False;S:ProbeType=OnPremisesSmtpClientSubmission;S:Mailbox=d9f070e0-eda9-4c5c-ab42-722cf42f4b62

This is what I’ve come up with so far:
(timestamp=)(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\s{4}),(clientIP=)(\d{1,}.\d{1,}.\d{1,}.\d{1,}),(clientHostname=)(\w{1,}),(serverIP=)(\d{1,}.\d{1,}.\d{1,}.\d{1,}),(serverHostname=)(\w{1,}),(sourceContext=)(\w{1,});(connectorID=)(\d),(source=)(\w{1,}),(eventID=)(\w{1,}\s{1}\w{1}),(internalMessageID=)(\d{1,}),(messageID=)(\w{1,}\p\w{1,}.\w{1,}.\w{1,}),(networkMessageID=)(\d{1,}-\d{1,}),(recipientAddress=)(\w\d{1,}\p\w{1,}),(recipientStatus=)(\w\d{1,}\p\w{1,}),(totalBytes=)(\d{1,}\w{1,}),(recipientCount))(\w{1,}),(relatedRecipientAddress=)(\d{1,}.\d{1,}.\d{1,}.\d{1,}),(reference=)(\*?)?$

Windows events are so horrible to parse…
does this even make any sense what I came up with?
I haven’t extracted all fields so far, just wanted to make sure I’m not running in the wrong direction…

Does something like “*?” as a wildcard work?

If I come up with a working solution I will put it online on github, because it could be useful to someone else as well.

cheers,
theresa


(nomoresecrets) #2

As the exchange message track is a csv file you probably don’t need a massive load of regular expressions. I’d suggest to simply use “,” as field seperator and to distinguish between string and numeric/integer values.

I am using nxlog with the “xm_csv” module to transfer the logs from the exchange server to our Graylog GELF inputs.


(theresa) #3

thanks nomoresecrets.

would you mind posting your win nxlog.conf ? are you able to filter for specific fields of the message tracking log in graylog then?
how would you search for a specific field like “recipient email address” or “sender email address” ?


(Stephen) #4

I’m not familiar with exchange tracking logs but with a delimited log, and if the logs are always in that format I’d definitely agree building a grok expression would be your best bet.

to start on that log string you have you can do something like:

%{TIMESTAMP_ISO8601:timestamp},%{IP:clientIP},%{HOSTNAME:clientHostname},%{IP:serverIP},%{HOSTNAME:serverHostname}

And use Named Captures Only, that will populate just the fields that you specify. In the case of the beginning of that tracking log with the grok pattern above you will be getting:

{
  "timestamp": [
    [
      "2016-04-02T16:06:58.552Z"
    ]
  ],
  "clientIP": [
    [
      "1.2.3.4"
    ]
  ],
  "clientHostname": [
    [
      "client.fqdn.net"
    ]
  ],
  "serverIP": [
    [
      "2.3.4.5"
    ]
  ],
  "serverHostname": [
    [
      "server-fqdn"
    ]
  ]
}

Given that, you will be able to search by those fields in graylog, if your search contains serverHostname:server-fqdn, that message will pop up along with any that have that hostname as its serverHostname.

Two other tips:

  1. If there are string patterns for which no predefined grok pattern exist, you can definitely add that pattern to graylog as a combinations of patterns as well as regular expressions.
    TIMESTAMP_ISO8601 for example is a grok pattern that contains grok patterns within itself:
    TIMESTAMP_ISO8601 = %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}? .

  2. if your log contains a field identifier with it’s value such as:
    S:DeliveryPriority=Normal
    your grok pattern can have that integrated so that it can look like:
    S:DeliveryPriority=%{WORD:sDeliveryPriority}


(theresa) #5

Thank you, Stephen!

Just found this blog, would something similar also work in Graylog, considering that it also uses the exact same grok patterns?

There’s also a dashboard in the JSON format… could this also work in Graylog, or does Graylog use a different format for its dashboards?
https://gist.githubusercontent.com/elijahpaul/4b9cd98715c0ba2a75de/raw/9324f7c637e486fb53c8e66e1552595b73b5e636/exchange_msg_trak_dash_v1.json

cheers,
theresa


(Stephen) #6

@micsnare, The link shows the steps to set up those exchange tracking logs to work on an ELK deployment so it doesnt exactly translate to graylog. The configuration for the logstash filter that you see there with all the columns and mutators is essentially what you would be doing by setting up the grok expression in Graylog.

The JSON file is a Kibana dashboard file so it will not work as a Graylog Dashboard


(theresa) #7

Thank you Stephen.

Just to be on the safe side, I have now configured the following grok pattern in Graylog (System -> Grok Patterns)

	(%{TIMESTAMP_ISO8601:date-time})?,(%{IPORHOST:client-ip})?,(%{IPORHOST:client-hostname})?,(%{IPORHOST:server-ip})?,(%{IPORHOST:server-hostname})?,(%{GREEDYDATA:source-context})?,(%{GREEDYDATA:connector-id})?,(%{WORD:source})?,(%{WORD:event-id})?,(%{NUMBER:internal-message-id})?,(%{GREEDYDATA:message-id})?,(%{GREEDYDATA:network-message-id})?,(%{GREEDYDATA:recipient-address})?,(%{GREEDYDATA:recipient-status})?,(%{NUMBER:total-bytes})?,(%{NUMBER:recipient-count})?,(%{GREEDYDATA:related-recipient-address})?,(%{GREEDYDATA:reference})?,(%{GREEDYDATA:message-subject})?,(%{GREEDYDATA:sender-address})?,(%{GREEDYDATA:return-path})?,(%{GREEDYDATA:message-info})?,(%{WORD:directionality})?,(%{GREEDYDATA:tenant-id})?,(%{IPORHOST:original-client-ip})?,(%{IPORHOST:original-server-ip})?,(%{GREEDYDATA:custom-data})?,(%{GREEDYDATA:transport-traffic-type})?,(%{GREEDYDATA:log-id})?,(%{GREEDYDATA:schema-version})?

is it relevant how I name this grok pattern, or could I just assign the name “exchange” to this pattern?

Would I need to configure anything apart from that as well?

I think once I have the data searchable I will look into creating dashboards from there.

Thanks again for the help! :slight_smile:


(Stephen) #8

Are you ingesting these logs in graylog already? You will need to configure that pattern as an extractor on the input where you are receiving the exchange logs.

I’d suggest not saving this pattern in Graylog (System -> Grok Patterns) and instead, when you are receiving those messages, configure the pattern through System -> Inputs -> Manage Extractors(on the configure input for your exchange logs). That way you can will be able to load a sample message from the input, configure the pattern, and correct any mistakes before saving the pattern to run on that input.


#9

hi,

you have several greedydata patterns there. You probably end up in a very slow extractor. You could try to get it more efficient by replacing the greedydata parts with something like (?<field_name>[^,]*) (to extract stuff until the next comma), or define a NONCOMMA grok pattern…

See
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html


(nomoresecrets) #10

Here’s my nxlog config.

define ROOT C:\Program Files (x86)\nxlog

Moduledir %ROOT%\modules
CacheDir %ROOT%\data
Pidfile %ROOT%\data\nxlog.pid
SpoolDir %ROOT%\data
LogFile %ROOT%\data\nxlog.log

<Extension gelf>
    Module      xm_gelf
</Extension>

##########################
## sending exchange message log
########################## 

define BASEDIR_MSGTRK C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\Logs\MessageTracking
define BASEDIR_RCVCON C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\Logs\ProtocolLog\SmtpReceive

<Extension csv_msgtrk>
   Module	   xm_csv
   Fields      $date_time, $client_ip, $client_hostname, $server_ip, $server_hostname, $source_context, $connector_id, $exchange_source, $event_id, $internal_message_id, $message_id, $recipient_address, $recipient_status, $total_bytes, $recipient_count, $related_recipient_address, $reference, $message_subject, $sender_address, $return_path, $message_info, $directionality, $tenant_id, $original_client_ip, $original_server_ip, $custom_data
   FieldTypes  string, string, string, string, string, string, string, string, string, integer, string, string, string, integer, integer, string, string, string, string, string, string, string, string, string, string, string
   Delimiter   ,
</Extension>

<Extension csv_rcvcon>
   Module	   xm_csv
   Fields      $date_time, $connector_id, $session_id, $sequence_number, $local_endpoint, $remote_endpoint, $event, $data, $context
   FieldTypes  string, string, string, integer, string, string, string, string, string
   Delimiter   ,
   EscapeChar  \n
</Extension>

<Input in_exchange_msgtrk>  
   Module     im_file
   File       '%BASEDIR_MSGTRK%\MSGTRK????????*-*.LOG' # Exports all logs in Directory
   SavePos    TRUE
   Exec       if $raw_event =~ /HealthMailbox/ drop();
   Exec       if $raw_event =~ /^#/ drop();
   Exec       csv_msgtrk->parse_csv();
   Exec       $Hostname = hostname_fqdn();
</Input>

<Input in_exchange_rcvcon>  
   Module     im_file
   File       '%BASEDIR_RCVCON%\RECV*.LOG' # Exports all logs in Directory
   SavePos    TRUE
   Exec       if $raw_event =~ /^#/ drop();
   Exec       csv_rcvcon->parse_csv();
   Exec       $Hostname = hostname_fqdn();
</Input>

<Output out_exchange_msgtrk>  
   Module     om_udp
   Host       graylog-input.mydomain.com
   Port       12203
   OutputType GELF
   Exec       $SourceName = 'exchange_msgtrk_log';
   Exec       $sender_address = lc($sender_address);
   Exec       $recipient_address = lc($recipient_address);
   Exec       $return_path = lc($return_path);
</Output>

<Output out_exchange_rcvcon>  
   Module     om_udp
   Host       graylog-input.mydomain.com
   Port       12203
   OutputType GELF
   Exec       $SourceName = 'exchange_rcvcon_log';
   Exec       $sender_address = lc($sender_address);
   Exec       $recipient_address = lc($recipient_address);
   Exec       $return_path = lc($return_path);
</Output>

<Route exchange_msgtrk>  
    Path      in_exchange_msgtrk => out_exchange_msgtrk
</Route> 

<Route exchange_rcvcon> 
	Path      in_exchange_rcvcon => out_exchange_rcvcon
</Route> 

#11

Hi, not sure if this will help, but this is how I’m doing it.

I’m using filebeats to ship the logs. I’m adding a “type” of ex-msg-trk-transport to the transport logs and ex-msg-trk-mailbox to the mailbox logs. Then using pipeline rules:

rule "type ex-msg-trk-mailbox"
when
has_field(“type”) && to_string($message.type) == "ex-msg-trk-mailbox"
then
// grok the message field
let message_field = to_string($message.message);
let action = grok(pattern: “(%{TIMESTAMP_ISO8601:date-time})?,(%{IPORHOST:client-ip})?,(%{IPORHOST:client-hostname})?,(%{IPORHOST:server-ip})?,(%{IPORHOST:server-hostname})?,(%{GREEDYDATA:source-context})?,(%{GREEDYDATA:connector-id})?,(%{WORD:source-component})?,(%{WORD:event-id})?,(%{NUMBER:internal-message-id})?,(%{GREEDYDATA:message-id})?,(%{GREEDYDATA:recipient-address})?,(%{GREEDYDATA:recipient-status})?,(%{NUMBER:total-bytes})?,(%{NUMBER:recipient-count})?,(%{GREEDYDATA:related-recipient-address})?,(%{GREEDYDATA:reference})?,(%{GREEDYDATA:message-subject})?,(%{GREEDYDATA:sender-address})?,(%{GREEDYDATA:return-path})?,(%{GREEDYDATA:message-info})?,(%{WORD:directionality})?,(%{GREEDYDATA:tenant-id})?,(%{IPORHOST:original-client-ip})?,(%{IPORHOST:original-server-ip})?,(%{GREEDYDATA:custom-data})?”, value: message_field, only_named_captures: true);
set_fields(action);
end

rule "type ex-msg-trk-transport"
when
has_field(“type”) && to_string($message.type) == "ex-msg-trk-transport"
then
// grok the message field
let message_field = to_string($message.message);
let action = grok(pattern: “(%{TIMESTAMP_ISO8601:date-time})?,(%{IPORHOST:client-ip})?,(%{IPORHOST:client-hostname})?,(%{IPORHOST:server-ip})?,(%{IPORHOST:server-hostname})?,(%{GREEDYDATA:source-context})?,(%{GREEDYDATA:connector-id})?,(%{WORD:source-component})?,(%{WORD:event-id})?,(%{NUMBER:internal-message-id})?,(%{GREEDYDATA:message-id})?,(%{GREEDYDATA:recipient-address})?,(%{GREEDYDATA:recipient-status})?,(%{NUMBER:total-bytes})?,(%{NUMBER:recipient-count})?,(%{GREEDYDATA:related-recipient-address})?,(%{GREEDYDATA:reference})?,(%{GREEDYDATA:message-subject})?,(%{GREEDYDATA:sender-address})?,(%{GREEDYDATA:return-path})?,(%{GREEDYDATA:message-info})?,(%{WORD:directionality})?,(%{GREEDYDATA:tenant-id})?,(%{IPORHOST:original-client-ip})?,(%{IPORHOST:original-server-ip})?,(%{GREEDYDATA:custom-data})?”, value: message_field, only_named_captures: true);
set_fields(action);
end


(system) #12

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.