Warning Immense term in field in graylog serverlog

I am seeing below warning in graylog server log, any idea to avoid this ?? i though this issue is fixed in 2.2

[1]: index [graylog_9], type [message], id [829d2eb0-05b8-11e7-8dc2-5254007b267d], message [java.lang.IllegalArgumentException: Document contains at least one immense term in field="workflow_tag" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[50, 48, 49, 54, 45, 48, 57, 45, 50, 56, 32, 50, 48, 58, 49, 53, 58, 49, 50, 32, 108, 101, 118, 101, 108, 61, 73, 78, 70, 79]...', original message: bytes can be at most 32766 in length; got 479097]
2017-03-10T12:39:46.949-05:00 ERROR [Messages] Failed to index [1] messages. Please check the index error log in your web interface for the reason. Error: failure in bulk execution:
[0]: index [graylog_9], type [message], id [8d0a18e0-05b8-11e7-8dc2-5254007b267d], message [java.lang.IllegalArgumentException: Document contains at least one immense term in field="msg" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[108, 105, 115, 116, 32, 111, 102, 32, 102, 105, 108, 101, 115, 32, 102, 111, 114, 32, 116, 104, 105, 115, 32, 114, 117, 110, 32, 104, 100, 102]...', original message: bytes can be at most 32766 in length; got 1487078]
2017-03-10T12:40:00.689-05:00 ERROR [Messages] Failed to index [1] messages. Please check the index error log in your web interface for the reason. Error: failure in bulk execution:

See these topics/issues:

Is there any way stop from filebeat side, mean sikp the message with morethan 32KB from client side itself

I tried to configure max_bytes with 32760, but its not working

according to this file one option is available to limit the size:

https://github.com/elastic/beats/blob/master/filebeat/filebeat.full.yml#L199

1 Like

you can use a pipeline rule truncating that field like I did:

rule "truncate winlogbeat_event_data_Binary"
when
has_field(“winlogbeat_event_data_Binary”)
then
let winlogbeat_event_data_Binary = substring(“winlogbeat_event_data_Binary”, 32760);
set_field(“winlogbeat_event_data_Binary”, winlogbeat_event_data_Binary);
end

1 Like

Hello,
I have the same problem (Graylog 3.0.3) and we use a rule and still we see the error message
Error:
{“type”:“illegal_argument_exception”,“reason”:“Document contains at least one immense term in field=“logmessage” (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: ‘[69, 110, 116, 101, 114, 101, 100, 32, 65, 114, 103, 117, 109, 101, 110, 116, 115, 91, 91, 123, 114, 111, 108, 101, 95, 110, 97, 109, 101, 61]…’, original message: bytes can be at most 32766 in length; got 33151”,“caused_by”:{“type”:“max_bytes_length_exceeded_exception”,“reason”:“max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 33151”}}

Our pipeline rule:
rule “logmessage limit size”
when
has_field(“logmessage”)
then
set_field(“logmessage”, abbreviate(to_string($message.logmessage), 32760));
end

Any idea, could it be a bug or a problem with unicode characters?

Edit:
The rule is connected to 2 streams where this field can occur.

you need to lower the bytes as elasticsearch does have different values for the characters that you fill in.

@jan: Not sure if I understand you.
We tried to limit it at 16K, no luck.
We limit 2 other fields in size too. No problems with those.

The magic length seems to be 16382 (16K - 2). We see no issues when we apply rules shortening to this length.