Drop messages based on the log's size?

razumzhiro · June 25, 2018, 6:23pm

Trying to blacklist logs over the 32766 bytes limit from reaching Graylog.

Is there a way to do this with Graylog as it is?

Any suggestions on where I could start?

jan · June 26, 2018, 9:20am

if you want to drop the messages you can check the size of the message in the processing pipelines and drop the message if it is oversize.

the better solution (depending on your usecase) would be to add a custom elasticsearch template with the option ignore above mentioned in this issue ( https://github.com/Graylog2/graylog2-server/issues/873 ) that would make Elasticsearch ignore the overhead and you do not loose the message.

razumzhiro · June 26, 2018, 9:12pm

Thank you for the suggestions!

If the pipelines route does not work as I hope, I’ll try the Elasticsearch route. Thank you for that alternative.

I looked through the functions and resources available for rules and pipelines in the documentation and could not identify how to check message size – I could find everything for filtering on the contents of the messages, but I could not clearly figure out how to discover size and then block based on that value. That is my current blocker.

I reviewed all the options available here : http://docs.graylog.org/en/2.4/pages/pipelines/functions.html

jan · June 27, 2018, 7:49am

try with abbreviate that should help you

abbreviate(field, maxSize)

http://docs.graylog.org/en/2.4/pages/pipelines/functions.html#abbreviate

rule "shorten messages over 32766 byte"
when
    has_field("message")
then
    set_field("message", abbreviate(to_string($message.message), 32766));
end

Or us regex to check the size of the string and then drop

rule "drop messages over 16383 characters"
when
    has_field("message") AND
    regex(to_string($message.message), "^.{16383,}$").matches == true
then
    drop_message();
    debug( concat("dropped oversized message from ", to_string($message.source)));
end

http://docs.graylog.org/en/2.4/pages/pipelines/functions.html#regex

both above functions I did not test - just wrote them down.

razumzhiro · June 27, 2018, 4:16pm

AH! I understand the logic you’re applying here, that makes much more sense now.

Thank you so much for presenting these solution approaches, I will go and experiment.

This gets me on the path and I’m sure I’ll better understand pipeline rules now.

razumzhiro · June 27, 2018, 6:27pm

Based on my experiments, I think your original proposition to use an Elasticsearch template with ignore above will be better. Trying to prevent indexer errors from massive log messages is the original usecase, but the functionality of the pipelines will help us prevent other problems.

I love what we can do with the pipelines, thank you so much for helping with this.

system · July 11, 2018, 6:27pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How do you prevent killing graylog Server from index failures? Graylog Central (peer support) pipeline-rules , debuggingpl	3	575	November 12, 2019
Reduce size of daily logs from different devices Graylog Central (peer support)	5	715	December 31, 2021
Dropping some log messages with specific field value Graylog Central (peer support)	7	7702	June 21, 2017
Get log message size Graylog Central (peer support)	6	4779	March 1, 2018
Large log line clogging process-buffer Graylog Central (peer support)	6	215	June 29, 2023

Drop messages based on the log's size?

Related topics