General Question Making Sense of Data

First, apologies if this is not the correct venue; Second, apologies for not using the prescribed format. Since this is not a tech issue I suspect that aforementioned format is largely inapplicable.

I’ve been using Graylog Community for several months now and have somewhat surprisingly never had to manually parse a mess of data. However, I just installed sysmon for Linux on a server and the result info that gets ingested by Graylog via syslog is just vomit soup. Now, each message contains “tags” (i.e. ) but being the noob I am I am not sure how to parse these messages into something readable.

So I am requesting to be pointed in the right direction. Are we talking GROK? Pipelines? RegEx? I unfortunately don’t know what I don’t know.

Thank you!

Can you give a couple of disparate examples of messages with descriptions of what you want out of them? It will help when thinking of how to handle them. You can also filter out/drop messages with filebeat/nxlog before they get to Graylog (you would have to post examples of your sidecar configuration for help on that) You can also drop messages in Graylog, it’s usually more efficient at the client - I say this about dropping because usually there is good and useless stuff in vomit soup. :smiley:

Thank you for the prompt response! And yes, I was thinking of something like filebeat/nxlog because Winlogbeat on the Windows side makes the corresponding vomit soup much tastier. Here is an example:

<Event><System><Provider Name="Linux-Sysmon" Guid="{ff032593-a8d3-4f13-b0d6-01fc615a0f97}"/><EventID>1</EventID><Version>5</Version><Level>4</Level><Task>1</Task><Opcode>0</Opcode><Keywords>0x8000000000000000</Keywords><TimeCreated SystemTime="2022-08-22T19:40:57.649402000Z"/><EventRecordID>105680</EventRecordID><Correlation/><Execution ProcessID="1872975" ThreadID="1872975"/><Channel>Linux-Sysmon/Operational</Channel><Computer>ryzen7-3700x</Computer><Security UserId="0"/></System><EventData><Data Name="RuleName">TechniqueID=T1059.004,TechniqueName=Command and Scriptin</Data><Data Name="UtcTime">2022-08-22 19:40:57.653</Data><Data Name="ProcessGuid">{c4966c14-dbc9-6303-add2-c78dcf550000}</Data><Data Name="ProcessId">2235174</Data><Data Name="Image">/usr/bin/bash</Data><Data Name="FileVersion">-</Data><Data Name="Description">-</Data><Data Name="Product">-</Data><Data Name="Company">-</Data><Data Name="OriginalFileName">-</Data><Data Name="CommandLine">/bin/bash /home/matthew/.config/polybar/modules/caffeine.sh</Data><Data Name="CurrentDirectory">/home/matthew</Data><Data Name="User">matthew</Data><Data Name="LogonGuid">{c4966c14-0000-0000-e803-000000000000}</Data><Data Name="LogonId">1000</Data><Data Name="TerminalSessionId">3</Data><Data Name="IntegrityLevel">no level</Data><Data Name="Hashes">-</Data><Data Name="ParentProcessGuid">{c4966c14-dbc9-6303-bdea-45c987550000}</Data><Data Name="ParentProcessId">2235173</Data><Data Name="ParentImage">/usr/bin/dash</Data><Data Name="ParentCommandLine">/bin/sh</Data><Data Name="ParentUser">matthew</Data></EventData></Event>

I know there is probably a very simple - albeit time consuming - solution.

I have been hunting around for a bit - I am not sure what format that is in to find something that would naturally pull it apart for you. Barring finding something in the Graylog Marketplace or on the internet, I would likely end up going through it with GROK. I will look more later, update if I find anything. :smiley:

Hello @accidentaladmin

That looks like XML.

@tmacgbay I cleaned it up a bit.

<Event>
<System>
<Provider Name="Linux-Sysmon" Guid="{ff032593-a8d3-4f13-b0d6-01fc615a0f97}"/>
<EventID>1</EventID><Version>5</Version>
<Level>4</Level><Task>1</Task>
<Opcode>0</Opcode><Keywords>0x8000000000000000</Keywords>
<TimeCreated SystemTime="2022-08-22T19:40:57.649402000Z"/>
<EventRecordID>105680</EventRecordID><Correlation/>
<Execution ProcessID="1872975" ThreadID="1872975"/><
Channel>Linux-Sysmon/Operational</Channel>
<Computer>ryzen7-3700x</Computer>
<Security UserId="0"/></System>
<EventData>
<Data Name="RuleName">TechniqueID=T1059.004,TechniqueName=Command and Scriptin</Data>
<Data Name="UtcTime">2022-08-22 19:40:57.653</Data>
<Data Name="ProcessGuid">{c4966c14-dbc9-6303-add2-c78dcf550000}</Data>
<Data Name="ProcessId">2235174</Data><Data Name="Image">/usr/bin/bash</Data>
<Data Name="FileVersion">-</Data>
<Data Name="Description">-</Data>
<Data Name="Product">-</Data>
<Data Name="Company">-</Data>
<Data Name="OriginalFileName">-</Data>
<Data Name="CommandLine">/bin/bash /home/matthew/.config/polybar/modules/caffeine.sh</Data>
<Data Name="CurrentDirectory">/home/matthew</Data>
<Data Name="User">matthew</Data><Data Name="LogonGuid">{c4966c14-0000-0000-e803-000000000000}</Data>
<Data Name="LogonId">1000</Data>
<Data Name="TerminalSessionId">3</Data><Data Name="IntegrityLevel">no level</Data><Data Name="Hashes">-</Data>
<Data Name="ParentProcessGuid">{c4966c14-dbc9-6303-bdea-45c987550000}</Data>
<Data Name="ParentProcessId">2235173</Data>
<Data Name="ParentImage">/usr/bin/dash</Data>
<Data Name="ParentCommandLine">/bin/sh</Data><Data Name="ParentUser">matthew</Data>
</EventData>
</Event>

Windows Event Viewer.

image

Edit: @accidentaladmin can I ask how your send these logs? Meaning what type of log shipper your using.
For example, Nxlog has extensions for grabbing specific logs.

<Extension _syslog>
    Module      xm_syslog
</Extension>

<Extension _gelf>
    Module      xm_gelf
</Extension>
<Extension _json>
    Module      xm_json
</Extension>

Then sending them in that specific log format here

<Output out>
    Module    om_tcp
    Host      192.168.1.1
    Port      1514
    Exec      to_json();
</Output>
1 Like

By default it writes to syslog so rsyslog is taking care of getting the logs over to Graylog

Hello

If your receiving logs as shown above, you have a couple choices, that is if this needs to be corrected.

Change the INPUT, Log shipper and/or reconfigure Rsyslog. For troubleshooting you could create a new input Raw/Plaintext UDP just for sysmon logs.

As shown above, Nxlog can be configured before the logs are shipped. I’m sure the other log shippers can do this also for unique log types.

Edit: I forgot to mention, with those types of logs( i.e., json, XML), you might need a pipeline/Extractor to separate all the fields.

1 Like

I ended up using Filebeat with it’s xml decoder configured via Sidecar (btw, what a great piece of software). It did a pretty okay job. It turns this:

message
Aug 23 08:48:38 ryzen7-3700x sysmon: <Event><System><Provider Name="Linux-Sysmon" Guid="{ff032593-a8d3-4f13-b0d6-01fc615a0f97}"/><EventID>1</EventID><Version>5</Version><Level>4</Level><Task>1</Task><Opcode>0</Opcode><Keywords>0x8000000000000000</Keywords><TimeCreated SystemTime="2022-08-23T12:48:38.907000000Z"/><EventRecordID>508415</EventRecordID><Correlation/><Execution ProcessID="1872975" ThreadID="1872975"/><Channel>Linux-Sysmon/Operational</Channel><Computer>ryzen7-3700x</Computer><Security UserId="0"/></System><EventData><Data Name="RuleName">TechniqueID=T1059.004,TechniqueName=Command and Scriptin</Data><Data Name="UtcTime">2022-08-23 12:48:38.911</Data><Data Name="ProcessGuid">{c4966c14-cca6-6304-7d57-84df0b560000}</Data><Data Name="ProcessId">3347101</Data><Data Name="Image">/usr/bin/bash</Data><Data Name="FileVersion">-</Data><Data Name="Description">-</Data><Data Name="Product">-</Data><Data Name="Company">-</Data><Data Name="OriginalFileName">-</Data><Data Name="CommandLine">/bin/bash /home/matthew/.config/polybar/modules/uname.sh</Data><Data Name="CurrentDirectory">/home/matthew</Data><Data Name="User">matthew</Data><Data Name="LogonGuid">{c4966c14-0000-0000-e803-000000000000}</Data><Data Name="LogonId">1000</Data><Data Name="TerminalSessionId">3</Data><Data Name="IntegrityLevel">no level</Data><Data Name="Hashes">-</Data><Data Name="ParentProcessGuid">{c4966c14-cca6-6304-bdea-b93c9d550000}</Data><Data Name="ParentProcessId">3347100</Data><Data Name="ParentImage">/usr/bin/dash</Data><Data Name="ParentCommandLine">/bin/sh</Data><Data Name="ParentUser">matthew</Data></EventData></Event>

Into this:

beats_type
filebeat
filebeat_@metadata_beat
filebeat
filebeat_@metadata_type
_doc
filebeat_@metadata_version
8.3.3
filebeat_@timestamp
2022-08-23 08:48:39.492 -04:00
filebeat_agent_ephemeral_id
90be81d5-f87d-4948-b5e3-a0b8645997ef
filebeat_agent_id
e6c60bff-211a-4868-bbde-c755a175d6a6
filebeat_agent_name
ryzen7-3700x
filebeat_agent_type
filebeat
filebeat_agent_version
8.3.3
filebeat_collector_node_id
ryzen7-3700x
filebeat_ecs_version
8.0.0
filebeat_event_eventdata_data
[]
filebeat_event_eventdata_data_0_name
RuleName
filebeat_event_eventdata_data_10_name
CommandLine
filebeat_event_eventdata_data_11_name
CurrentDirectory
filebeat_event_eventdata_data_12_name
User
filebeat_event_eventdata_data_13_name
LogonGuid
filebeat_event_eventdata_data_14_name
LogonId
filebeat_event_eventdata_data_15_name
TerminalSessionId
filebeat_event_eventdata_data_16_name
IntegrityLevel
filebeat_event_eventdata_data_17_name
Hashes
filebeat_event_eventdata_data_18_name
ParentProcessGuid
filebeat_event_eventdata_data_19_name
ParentProcessId
filebeat_event_eventdata_data_1_name
UtcTime
filebeat_event_eventdata_data_20_name
ParentImage
filebeat_event_eventdata_data_21_name
ParentCommandLine
filebeat_event_eventdata_data_22_name
ParentUser
filebeat_event_eventdata_data_2_name
ProcessGuid
filebeat_event_eventdata_data_3_name
ProcessId
filebeat_event_eventdata_data_4_name
Image
filebeat_event_eventdata_data_5_name
FileVersion
filebeat_event_eventdata_data_6_name
Description
filebeat_event_eventdata_data_7_name
Product
filebeat_event_eventdata_data_8_name
Company
filebeat_event_eventdata_data_9_name
OriginalFileName
filebeat_event_system_channel
Linux-Sysmon/Operational
filebeat_event_system_computer
ryzen7-3700x
filebeat_event_system_eventid
1
filebeat_event_system_eventrecordid
508415
filebeat_event_system_execution_processid
1872975
filebeat_event_system_execution_threadid
1872975
filebeat_event_system_keywords
0x8000000000000000
filebeat_event_system_level
4
filebeat_event_system_opcode
0
filebeat_event_system_provider_guid
{ff032593-a8d3-4f13-b0d6-01fc615a0f97}
filebeat_event_system_provider_name
Linux-Sysmon
filebeat_event_system_security_userid
0
filebeat_event_system_task
1
filebeat_event_system_timecreated_systemtime
2022-08-23 08:48:38.907 -04:00
filebeat_event_system_version
5
filebeat_host_name
ryzen7-3700x
filebeat_input_type
filestream
filebeat_log_file_path
/var/log/user.log
filebeat_log_offset
1023961222

So, still needs some work (the “filebeat_event_eventdata_data_x_name” fields are just field titles and do not contain the associated data) but better than it was

1 Like

Great!! can you post your configuration for future searchers?

You can always drop fields with beats (Ignore_missing) or later in the pipeline with the remove_field() function.

Also, you can remove the annoying leading filebeat by checking the box in your Input:

image

Of course!
Here is my Sidecar config for Filebeat (yes, its overkill haha):

# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}

filebeat.inputs:
- type: filestream
  id: sysmon
  paths:
    - /var/log/syslog
    - /var/log/user.log
  processors:
    - decode_xml:
        field: message
        target_field: ""
        overwrite_keys: false

- type: filestream 
  id: logs
  paths:
    - /var/log/alternatives.log
    - /var/log/auth.log
    - /var/log/boot.log
    - /var/log/daemon.log
    - /var/log/dpkg.log
    - /var/log/fail2ban.log
    - /var/log/fontconfig.log
    - /var/log/kern.log
    - /var/log/mail.log
    - /var/log/ufw.log
    - /var/log/xdm.log
    - /var/log/Xorg.0.log
    - /var/log/apt/*.log
    - /var/log/cups/*.log
    - /var/log/exim4/*.log
    - /var/log/filebeat/*.log
    - /var/log/graylog-sidecar/*.log
    - /var/log/installer/*.log
    - /var/log/journal/*.log
    - /var/log/libvert/*.log
    - /var/log/lightdm/*.log
    - /var/log/nala/*.log
    - /var/log/netdata/*.log
    - /var/log/private/*.log
    - /var/log/runit/*.log
    - /var/log/sysstat/*.log
    - /var/log/timeshift/*.log
    - /var/log/unattended-uprades/*.log
  processors:
    - syslog:
        field: message
        format: auto
        overwrite_keys: false
  
output.logstash:
   hosts: ["192.168.xx.xxx:xxxx"]
path:
  data: /var/lib/graylog-sidecar/collectors/filebeat/data
  logs: /var/lib/graylog-sidecar/collectors/filebeat/log
3 Likes

This is a good example of learning something new for me! You can add in to the xml processor to ignore missing and failed fields:

processors:
  - decode_xml:
      field: example
      target_field: xml
      ignore_missing: true
      ignore_failure: true

(From: Elastic Decode XML page)

1 Like

Awesome, I agree the GL sidecar with FileBeat is pretty nice.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.