Handling syslog messages that are split by the source application

Hello!

We have one specific application which sends raw log messages with JSON payloads to Graylog, and we have a JSON extractor set up on the input to parse the fields in each message. This works fine for most of the messages, but this particular application splits the log message if it exceeds 1024 characters total (including the header, timestamp, and body).

When it does this, it truncates the first part with a literal “…” and the prefixes the next part with “…” as well. It will do this right in the middle of the JSON, cutting off a key or value with reckless abandon. A simple example, not using 1024 characters but just to give an idea:

Message 1:
<LOG>Oct 31 12:50:35 {"id":"12345","name":"Dave":"service":"Goog...

Message 2:
<LOG>Oct 31 12:50:35 ...le Workspace","action":"login"}

When this happens, the JSON extractor fails and we’re left with the raw messages. Is it possible to set something up in Graylog to detect this “…” nonsense and join those messages together? It would need to strip out everything before the leading “…” in the second message (or subsequent ones) and append the body to the preceding message.

Please note, we have no control over the strange logging behavior in this application.

Our specs:
Graylog 5.0.8+4c22532
Eclipse Adoptium 17.0.6
Linux 5.14.0-284.18.1.el9_2.x86_64

Thanks!

I haven’t come up with a solution, but some question I have as I was thinking about it:

  1. Does any of your data ever contain the “…” sequence, or is it only in the case of a truncated message?
  2. Are the truncated messages always sequential, or could there be an unrelated message slipped between them?
  3. can you change the max length of the message on the application, or failing that, can you change the delimiter to something more unique that would not appear in a message?
  4. Have you investigates slookup? GrayLog Stream Lookup (SLookup) Pipeline Processor function

I was thinking some sort of intermediate application to handle these and ship them to Graylog, but maybe there’s a way that Graylog can coalesce two messages together. I’d be interested to hear the solution you come up with, I hope you’ll post it.

Hi faen,

Thanks for letting me know about SLookup. I’ll look into that some more. The answers to your other questions are below.

  1. The 3-dot sequence (“…”) could possibly appear in the body of a message for other reasons, but it’s not likely.

  2. The partial messages which make up one whole message are not always show sequentially in Graylog, but they all have the exact same timestamp. And there can be more than two messages if the original was very large. When this happens, we will see one or more intermediary messages that both begin and end with “…”.

  3. We have absolutely no ability to change the max message length in the application, or otherwise alter this behavior. All we can do is turn this logging on or off, and set the remote host and port (our Graylog server in this case).

I thought about building an intermediate application to deal with this, but we set up Graylog specifically so we wouldn’t need anything like that. And it’s just one application out of the dozens we’re using with Graylog, so it’s probably not worthwhile. I’m not sure I’ll find a good solution if this isn’t something Graylog can already handle.

Thanks!

The application in question is almost certainly using the syslog appender functionality in log4j, and it looks like this is expected behavior:

Given that applications using this part of log4j will always split the message and prepend/append the “…” string, is there anything Graylog can do to reassemble the original un-split version?

Thanks!

Its probably possible to make a pipeline do this, but it wont be pleasant. Graylog treats every message independantly, so you cant pass data between messages as they pass through the pipelines. Slookup could work, BUT the message may not be stored yet that you are looking for, and it will be very computationally expensive to run that many searches.

I normally solve this with dynamic lookup tables, but that is an enterprise feature (unless you qualify for the free enterprise license)

Can you provide a link to the dynamic lookup tables? I wasn’t successful in finding anything specific regarding them vs regular lookup tables.

Slookup appears to be exactly what I was looking for and I’m surprised it’s not built in to Graylog.

Do you know if it will work with v5? It seems like there’s one fork for v4.1 and that’s it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

In the docs dynamic lookup tables are just called MongoDB lookup tables.

I haven’t seen anyone running slookup on 5, so I don’t know.

In 5.1 and newer there is the search/simple API, which allows you to run a search with a single GET call. This means you can create a lookup table that points to this API and returns search results. There is no documentation on doing this, but it has been done, although I have no idea if it’s been done in production and how it behaves. Should be fairly similar to slookup.