Difficulties to apply extractors using regex

Hello Community,

I’m having a hard time getting my regex rules to work in Graylog, more specifically in the extractors feature, like the examples below, I’m trying to get specific data inside of the message field, but, the more awkward to me is that my regex seems to work on the regex simulator like “https://regex101.com/”, but not work inside Graylog extractors.

Example of a regex that I’m using

^.*user_name=([\w\s]+)

I’m using dummy values for privacy reasons but the words are representing as the real data is.

I’m trying to understand the correct semantics and parameters supported by this regex implementation inside Graylog, but I failing hard to achieve this, if it’s possible, could you guys explain a little bit where am I doing wrong? is the Graylog that do not supports some of the parameters?

As you can see too, the message field contains some other data that I could get, do you know if it’s possible to get that using regex? because in my understanding of the documentation, the regex in Graylog will only get the first group match, am I right?

I have searched a lot but I could not find more examples of regex being used for this purpose (as an extractor), I have seen some examples, but used for pipelines and in grok patterns, but I have the feeling that inside the extractor the regex does not work in the same way, like if it’s not supported all the parameters or maybe it’s just lack of knowledge about this functionality.

If you have examples or tips, I will be very appreciative.

Thanks in advance.

Hello,

I might be able to help. Just so you know I’m no expert but I can get the job done.
As for REGEX there are different ways to execute that type of command, this is depending on what you using JAVA, etc…
Below I can give you a log and show different ways to grab what you need and create a field/s.

Example of REGEX after a "equal sign"

LOG:

Apr  6 17:56:03 nextcloud-web1 kernel: DROP OUTPUT: IN= OUT=eth0 SRC=10.10.10.10 DST=10.200.6.17 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=3972 PROTO=TCP SPT=10050 DPT=33874 WINDOW=29200 RES=0x00 ACK SYN URGP=0

REGEX

WINDOW=(\d+)

Results:

Example 2 getting string after colon.

LOG:

2022-03-23T16:53:18.474-05:00 INFO  [SessionCreator] Invalid credentials in session create request. Actor: "urn:graylog:user:greg.smith"

REGEX:

^.*urn:graylog:user:(.+)"$

Results:

Hope that helps

1 Like

I have time to throw some more examples at you :smiley:

Example REGEX Getting the UUID from Graylogs rest access log.

LOG:

2022-04-06 22:17:04,795 DEBUG: org.graylog2.rest.accesslog - 10.10.10.10 5c6749e483d72e66f4815937 [-] "PUT api/sidecars/f637ecec-d3d7-4efc-82e7-12d9ee5f7c53" Graylog Collector v1.1.0 202 -1

REGEX

(?<![0-9])(?:(?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![0-9])\s(\w+.\w+)

Results

Here is a Widget I made from the above example. The Rest access logs and converting it into a human readable name instead of showing the UUID. To apply this You will need a pipeline to match the user with the UUID.

Example 2 REGEX Getting string between quotes and after equal sign. Some stuff I had to cut out since it was personal, but I think you get it.

LOG

<189>date=2022-04-06 time=22:06:21 devname=“server101” devid=“FGT60D4613052014” logid=“0001000014” type=“traffic” subtype=“local” level=“notice” vd=“domain” eventtime=1649300781 srcip=1.1.1.1 srcport=23606 srcintf=unknown-0 srcintfrole=“undefined” dstip=10.10.10.10 dstport=389 dstintf=“VLAN115” dstintfrole=“lan” sessionid=8675309 proto=6 action=“timeout” policyid=42 service=“Internet-Locator-Service” dstcountry=“Reserved” srccountry=“Reserved” trandisp=“noop” app=“Internet-Locator-Service” duration=12 sentbyte=60 rcvdbyte=88 sentpkt=1 rcvdpkt=1 appcat=“unscanned”

REGEX

dstintf="(.*?)\"

Results

1 Like

Hi @gsmith ,

Wow, I don’t know what to say, thank you so much for all these well-explained examples, I’m reading carefully and starting to perform some tests following your explanations.
As soon as I have some feedback I will publish it here.

Thank you very much again!

1 Like

Hi @gsmith,

I noticed something interesting, I tried to use the first regex that you explained to me, and does not work again, but I had a snap and tried the same regex body to extract the same field but and this time, it was a message coming from Syslog, and to my surprise, it worked as it should.

Maybe I forgot to mention, and I apologize for that, but the message log that I’m trying to extract, is coming from a Sidecar/Filebeat input( and to be more detailed, it’s coming from a Powershell script that dumps MSSQL data into a text file, and the Sidecar/Filebeat get it).

I’m supposing right now that all these troubles that I was facing it’s connected to the fact that these logs are coming from a not formatted source, and it is possible to this be the cause? and if it is, do you know how can I solve this? to my ignorance I had the understanding that this is the function of the shipper, to get unformatted data and present it in an accessible way to the log manager, maybe there is a previous step that I’m forgetting, I really don’t know.

Well, at least you shed some light on this, I was going mad to none of my regex was working, neither were the simple ones.

What are your thoughts about it? what I should do?

Thanks in advance.

Hello,

By chance does you MSSQL logs look like this?

MSSQL_Logs
2022-03-08 20:14:20.16 Server      SQL Server detected 1 sockets with 4 cores per socket and 4 logical processors per socket, 4 total logical processors; using 4 logical processors based on SQL Server licensing. This is an informational message; no user action is required.
2022-03-08 20:14:20.16 Server      SQL Server is starting at normal priority base (=7). This is an informational message only. No user action is required.
2022-03-08 20:14:20.17 Server      Detected 4094 MB of RAM. This is an informational message; no user action is required.
2022-03-08 20:14:20.20 Server      Using conventional memory in the memory manager.
2022-03-08 20:14:21.17 Server      Default collation: SQL_Latin1_General_CP1_CI_AS (us_english 1033)
2022-03-08 20:14:22.45 Server      Buffer pool extension is already disabled. No action is necessary. 
2022-03-08 20:14:23.89 Server      InitializeExternalUserGroupSid failed. Implied authentication will be disabled.
2022-03-08 20:14:23.89 Server      Implied authentication manager initialization failed. Implied authentication will be disabled.
2022-03-08 20:14:24.71 Server      The maximum number of dedicated administrator connections for this instance is '1'
2022-03-08 20:14:24.73 Server      This instance of SQL Server last reported using a process ID of 3168 at 3/8/2022 8:13:38 PM (local) 3/9/2022 2:13:38 AM (UTC). This is an informational message only; no user action is required.
2022-03-08 20:14:24.78 Server      Node configuration: node 0: CPU mask: 0x000000000000000f:0 Active CPU mask: 0x000000000000000f:0. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required.
2022-03-08 20:14:24.81 Server      Using dynamic lock allocation.  Initial allocation of 2500 Lock blocks and 5000 Lock Owner blocks per node.  This is an informational message only.  No user action is required.
2022-03-08 20:14:24.84 Server      Database Instant File Initialization: enabled. For security and performance considerations see the topic 'Database Instant File Initialization' in SQL Server Books Online. This is an informational message only. No user action is required.
2022-03-08 20:14:24.85 Server      LogPool::SetArtificialMemoryLimit (536870912 Bytes)
2022-03-08 20:14:24.96 Server      Query Store settings initialized with enabled = 1, 
2022-03-08 20:14:25.08 Server      In-Memory OLTP initialized on lowend machine.
2022-03-08 20:14:25.21 spid5s      Starting up database 'master'.
2022-03-08 20:14:25.29 Server      CLR version v4.0.30319 loaded.
2022-03-08 20:14:25.57 Server      Common language runtime (CLR) functionality initialized using CLR version v4.0.30319 from C:\Windows\Microsoft.NET\Framework64\v4.0.30319\.
2022-03-08 20:14:26.17 spid5s      SQL Server Audit is starting the audits. This is an informational message. No user action is required.
2022-03-08 20:14:26.18 spid5s      SQL Server Audit has started the audits. This is an informational message. No user action is required.
2022-03-08 20:14:26.43 spid5s      SQL Trace ID 1 was started by login "sa".
2022-03-08 20:14:26.46 spid5s      Server name is 'VEEAM\VEEAMSQL2016'. This is an informational message only. No user action is required.
2022-03-08 20:14:26.52 spid6s      Starting up database 'mssqlsystemresource'.
2022-03-08 20:14:26.52 spid13s     Starting up database 'msdb'.
2022-03-08 20:14:26.52 spid14s     Starting up database 'VeeamBackup'.
2022-03-08 20:14:26.53 spid6s      The resource database build version is 13.00.5026. This is an informational message only. No user action is required.
2022-03-08 20:14:26.56 spid10s     The certificate [Cert Hash(sha1) "E398492E1714243805DA1F4DB294590659FCDAE7"] was successfully loaded for encryption.
2022-03-08 20:14:26.59 spid10s     Server is listening on [ 'any' <ipv6> 49994].
2022-03-08 20:14:26.61 spid10s     Server is listening on [ 'any' <ipv4> 49994].
2022-03-08 20:14:26.61 spid10s     Server local connection provider is ready to accept connection on [ \\.\pipe\SQLLocal\VEEAMSQL2016 ].
2022-03-08 20:14:26.61 spid10s     Server named pipe provider is ready to accept connection on [ \\.\pipe\MSSQL$VEEAMSQL2016\sql\query ].
2022-03-08 20:14:26.61 spid10s     Dedicated administrator connection support was not started because it is disabled on this edition of SQL Server. If you want to use a dedicated administrator connection, restart SQL Server using the trace flag 7806. This is an informational message only. No user action is required.
2022-03-08 20:14:26.62 spid10s     SQL Server is now ready for client connections. This is an informational message; no user action is required.
2022-03-08 20:14:26.62 Server      SQL Server is attempting to register a Service Principal Name (SPN) for the SQL Server service. Kerberos authentication will not be possible until a SPN is registered for the SQL Server service. This is an informational message. No user action is required.
2022-03-08 20:14:26.64 Server      The SQL Server Network Interface library could not register the Service Principal Name (SPN) [ MSSQLSvc/veeam:VEEAMSQL2016 ] for the SQL Server service. Windows return code: 0xffffffff, state: 63. Failure to register a SPN might cause integrated authentication to use NTLM instead of Kerberos. This is an informational message. Further action is only required if Kerberos authentication is required by authentication policies and if the SPN has not been manually registered.
2022-03-08 20:14:26.65 Server      The SQL Server Network Interface library could not register the Service Principal Name (SPN) [ MSSQLSvc/veeam:49994 ] for the SQL Server service.

If so this maybe tricky, perhaps looking at a Multiline for FileBeat here.

Curious about about this statement,

when I installed MSSQL on Windows Server 2016 during our instalation SQL Server setup it creates log files in a dated and time-stamped folder within %programfiles%\Microsoft SQL Server\ nnn \Setup Bootstrap\Log by default, where nnn are numbers that correspond to the version of SQL that’s being installed.

Are you using PowerShell to group single log lines then send them in text file? Or are these the logs that your sending?

Hello @gsmith

Let me explain better to you.

Curious about this statement,

I don’t know if you remember, I have opened a topic before.

https://community.graylog.org/t/trying-to-get-data-events-directly-from-a-third-party-application-database/22594/2

And you even helped me to understand better my choices, anyway, the “logs” that I’m talking about are literally data that I extract from MSSQL tables, long story short, I need to get better security event data from a DLP solution and the only way that I was able to do that it’s done this “dirt way” let’s say that.

Following the ideas that you and many have explained to me, I started to write a Powershell script to dump that database data into a file and let Filebeat ship to Graylog.

$SQLServer = "1.1.1.1\server"
$db = "dummy_data"
$user = "dummyuser"
$pwd ="dummypass"

  $selectdata = "SELECT TOP 5
       [date_time]
      ,[pc_name]
      ,[user_name]
      ,[file_name]
      ,[operation]
      ,[action]
      ,[source_type]
      ,[destination_type]
      ,[policy_name]
      ,[file_extension]
      ,[file_type]
      ,[file_size]
      ,[source_path]
      ,[destination_path]
      ,[application_name]
      ,[data_categories]
      ,[to_safe_zone]
      ,[is_tagged]
      ,[sensitive_content]
      ,[suspicous]
      ,[id_user]
      ,[id_pc]
      ,[id_application]
      ,[category_extension]
      ,[id_policy]
      ,[policy_bound]
      ,[module]
      ,[policy_restriction]
  FROM [dummy_data].[pbi].[data_security_view] ORDER BY date_time DESC"
 
$dump = Invoke-Sqlcmd -ServerInstance $SQLServer -Username $user -Password $pwd -Database $db -Query $selectdata 

for ($count=0; $count -lt $dump.Count; $count++)
{
     echo "$("pc_name="+$dump[$count].pc_name) $("user_name="+$dump[$count].user_name) $("file_name="+$dump[$count].file_name) $("operation="+$dump[$count].operation)"  >> "C:\Program Files\Management Console\Logs\dummy_logs.log"
     
}

I know this is not pretty, but it’s a start, so, as you can see in my ugly code after I perform a select I dump the output into a text file, and the file looks like this.

So, it’s basically that.

Oh yeah, Thanks for enlightening me. Ok I understand now.

My Apologies I’m starting to get old :laughing:

As for the logs with FileBeat probably need to use the multiline.

Also, Can you give me a mockup log from that text file. I can copy it and ingest it in my Lab server perhaps come up with a solution for you. I believe you stated that your using FileBeat?

Oh yeah, Thanks for enlightening me. Ok I understand now.
My Apologies I’m starting to get old :laughing:

Oh, I’m so sorry if I sounded rude, English it’s my second language, so, I’m still learning the small details of polite communication, forgive me, and without your help, I still would be lost, so, thank you very much!

As for the logs with FileBeat probably need to use the multiline.

I’m not having trouble with the multiline because my log file it is saving line by line, in other words, each line it’s a single event.

Also, Can you give me a mockup log from that text file. I can copy it and ingest it in my Lab server perhaps come up with a solution for you. I believe you stated that your using FileBeat?

Oh yeah, for sure, I will send to you as a particular message, thanks again!
And yeah, I’m using Filebeat as the sipper for the Graylog Sidecar.

Oh your good :+1: I normally look into members previous post/s but this time I forgot.

Hi @gsmith,

I didn’t find a way to upload the file, so, you can grab it here.
https://drive.google.com/file/d/1c19q__m7Ln5w4sjj7dWmAxJgIu3DfB4J/view?usp=sharing

Ps: I almost forgot, this is my sidecar/filebeat configuration.

# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}

output.logstash:
   hosts: ["1.1.1.1:5044"]
path:
  data: C:\Program Files\Graylog\sidecar\cache\filebeat\data
  logs: C:\Program Files\Graylog\sidecar\logs
  
tags:
 - windows
 - SQL Extracted Logs
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - C:\Program Files\Management Console\Logs\dummy_logs.log
  #json.message_key: log
  #json.keys_under_root: true 
  #json.add_error_key: true

  
  #multiline.type: pattern
  #multiline.pattern: '{'
  #multiline.negate: true
  #multiline.match: after
  #multiline.flush_pattern: '}'

Thanks in advance.

1 Like

Ok I think I might have something give me a sec,

What fields did you want?

I want to extract all the fields, in my environment, I couldn’t extract any of the fields.

Hello

Ok so this is what I have for you, I only did one so I’ll let you do the others :wink:
I’m going to show steps /w Screenshots

Steps Executed:

Find MSSQL logs. The navigate to the Field “message”. At the end of the message there is an arrow, Click it.

Choose Regular expression.

image

REGEX

pc_name=(\w+)

Results:


Hope that helps :smiley:

Here is a better example of that extractor.

@lmattos90

You should be all set, I was going to ask and just noticed this. The pc_name= Is there always “-” after?

Example
pc_name=greg-smith

If so, Might need to come up with something different.

@gsmith,

I’m suspecting that it’s my environment that has some problems (I swear to you that I’m doing the same as you).

You should be all set, I was going to ask and just noticed this. The pc_name = Is there always “-” after?

Yes, I need to match any of these characters that could be used in the computer names.

Make sure your typing it in and there are no spaces.

By chance are you using FQDN for the pc_name=?

1 Like

@lmattos90
If that not working try this then.

pc_name=(?=\S*['-])([a-zA-Z'-]+)

(?=                 # Look-Ahead
  \S                # <not a whitespace character>
  *                 # (zero or more)(greedy)
  ['-]              # Character in ['-] Character Class
)                   # End of Look-Ahead
(                   # Capturing Group (1)
  [a-zA-Z'-]        # Character in [a-zA-Z'-] Character Class
  +                 # (one or more)(greedy)
)                   # End of Capturing Group (1)
1 Like

Yeah, the issue persists when I do the test.

I assure you that I’m typing just the way you told me, something it’s not right with this environment of mine.

Ps: and when I test the regex using https://regex101.com/, there always work.