Drop messages based on the log's size?


#1

Trying to blacklist logs over the 32766 bytes limit from reaching Graylog.

Is there a way to do this with Graylog as it is?

Any suggestions on where I could start?


(Jan Doberstein) #2

if you want to drop the messages you can check the size of the message in the processing pipelines and drop the message if it is oversize.

the better solution (depending on your usecase) would be to add a custom elasticsearch template with the option ignore above mentioned in this issue ( https://github.com/Graylog2/graylog2-server/issues/873 ) that would make Elasticsearch ignore the overhead and you do not loose the message.


#3

Thank you for the suggestions!

If the pipelines route does not work as I hope, I’ll try the Elasticsearch route. Thank you for that alternative.

I looked through the functions and resources available for rules and pipelines in the documentation and could not identify how to check message size – I could find everything for filtering on the contents of the messages, but I could not clearly figure out how to discover size and then block based on that value. That is my current blocker.

I reviewed all the options available here : http://docs.graylog.org/en/2.4/pages/pipelines/functions.html


(Jan Doberstein) #4

try with abbreviate that should help you

abbreviate(field, maxSize)

http://docs.graylog.org/en/2.4/pages/pipelines/functions.html#abbreviate

rule "shorten messages over 32766 byte"
when
    has_field("message")
then
    set_field("message", abbreviate(to_string($message.message), 32766));
end


Or us regex to check the size of the string and then drop

rule "drop messages over 16383 characters"
when
    has_field("message") AND
    regex(to_string($message.message), "^.{16383,}$").matches == true
then
    drop_message();
    debug( concat("dropped oversized message from ", to_string($message.source)));
end

http://docs.graylog.org/en/2.4/pages/pipelines/functions.html#regex

both above functions I did not test - just wrote them down.


#5

AH! I understand the logic you’re applying here, that makes much more sense now.

Thank you so much for presenting these solution approaches, I will go and experiment.

This gets me on the path and I’m sure I’ll better understand pipeline rules now.


#6

Based on my experiments, I think your original proposition to use an Elasticsearch template with ignore above will be better. Trying to prevent indexer errors from massive log messages is the original usecase, but the functionality of the pipelines will help us prevent other problems.

I love what we can do with the pipelines, thank you so much for helping with this.


(system) #7

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.