Query language - clustered matches?

Summary: I’m looking to make a sort of “group by” query against my logs. How can I do this?

I have a distributed application, in which nodes communicate with each other and use a unique Session ID to do so. I’d like to somehow present a view that shows the clusters of logs with each session ID. I have extracted the session ID into a field (and also have fields for the various operations that can take place - for example I have an is_auth field to say if it’s an authentication or not. I’m after something like this:

session: deadbeef
2.3.4.5 [timestamp] Authenticated with node 1.2.3.4 session=deadbeef
1.2.3.4 [timestamp] Node 2.3.4.5 authenticated with us session=deadbeef

session: beefcafe
5.6.7.8 [timestamp] Authenticated with node 2.3.4.5 session=beefcafe
2.3.4.5 [timestamp] Node 5.6.7.8 authenticated with us session=beefcafe

Once the above is done, I’d also like to (ideally) be able to filter which sessions are shown in the list, perhaps by the log text (eg. only show sessions where 2.3.4.5 authenticated with someone else). I’m happy to leave that gymnastics to another time though - the list above is the thing I really need!

I’m currently using Graylog Open 5.0.10 on Ubuntu 22. I’ve a dozen or so client machines pushing logs to it via rsyslog.

Any help with my search query skills (or any other aspect of Graylog that could do this) would be much appreciated!

Hey @coofercat

The way to go is a pipeline. This is not exact but you get the hint

rule "stage 1"
when
   contains  (to_string($message.message), "2.3.4.5")
then
  set_field("deadbeef","session=deadbeef");
end

The idea here is create the fields needed then the rest will work out.

Thanks @gsmith - although I’m not sure I understand how this helps me perform a search. Is there a way to list by field?

This feels like I need some clever fields, but even once I’ve got them, I’m not sure how to query the database to pull out “groups” of log lines based on fields or field values. What am I missing?

Hey @coofercat

yes this is what I was referring to. Once a field is generated with the information that is needed you can query that field.

If you could give some more information on exactly what your looking for would be helpfully?

The pipeline above will search the message field for “2.3.4.5”, and if its found, it will create a field called deadbeef , then that field can be queried. You can add on stating if there is a connection/authenticated then send it to a stream, etc… Or be able to create another field stating
“No_Connection” or what ever you need.

I can see from what you’re saying that you’d be able to search for a session ID and show all the lines that relate to it - searching by field rather than by text.

As I said though, I’m really after a “group by” sort of query, where it would start by pulling out all the unique session IDs, and then for each one, list all the log lines which have that session ID in them. For example:

session: 1234
10:23:13 (some log line from whatever server) session=1234
10:23:15 (another log line, perhaps from another server) session=1234
10:24:03 (more log lines from other servers, or maybe the same ones as above) session=1234

session: 2345
10:23:45 (something) session=2345
10:23:50 (some more) session=2345
...

The point being that rather than just listing log lines just by date, we’d list them by session ID (which itself is sorted by date), but the log lines for each session might be out-of-order with respect to session ID.

You wouldn’t do the query I require in SQL like this, but logically, you could imagine the query being like this pseudocode:

for SESSIONS in get_log_messages(pattern="_exists_:session"):
  print("Session: $SESSIONS.session")
  for MATCH in get_log_messages(pattern="session=$SESSIONS.session")
    print($MATCH.message)
  endfor
endfor

I can do both parts of this query in Query Language easily enough (_exists_:session and "session=1234"), but I can’t figure out if I can do both at once!?

Hey @coofercat

I havent done SQL query with GL Web UI mainly with Pipelines/ Extractors , etc… You can query using RegX in the search function.

The general query format is:

field_name:/regular expression/

Example:

srcIP:/127\..+\..+\..+/

Here is another example searching for ERROR

Not sure if that will help.

Thanks - sorry, SQL was a bit of a ‘side subject’ - I don’t know how to do it in query language either (hence the question).

Taking your search though, you are nicely filtering for things that are errors - can you now search also for things that are warnings? Such that all the errors are in one section and all the warnings are in another? Ie:

Error
log line 1
log line 4
log line 8

Warning
log line 2
log line 3
log line 5

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.