Correct mapping for gl2_message_id

nisow95612 · February 18, 2022, 1:57pm

Hello Graylog community,

I have noticed that graylog’s index templates have specific mappings only for full_message, gl2_accounted_message_size, gl2_processing_timestamp, message, source, streams and timestamp

However, there is no specific mapping for gl2_message_id, so it gets indexed by the default rule: as a keyword with both inverted index and doc values enabled.

I found that the doc values might be used for tie-breaking, but is there any use for the inverted index?

I ask, because, not counting _source, Add a disk-usage analysis API · Issue #68508 · elastic/elasticsearch · GitHub flags this gl2_message_id field as the second most disk-hungry, right after “message”.

gsmith · February 18, 2022, 11:00pm

Hello,

This is a hard question to answer.
Are you trying to shorten the _id or to remove it? If either one is correct, I’m not sure how to go about it.
If this is incorrect could you explain what your trying to achieve?

nisow95612 · February 21, 2022, 9:42am

Hi gsmith. I see you’ve been promoted to Leader in the meantime.

_id is an unique message id created by ES that allows us to fetch individual logs.
It is not a normal field, so I don’t think its mapping can be changed. Anyway, I don’t what to mess with it.

gl2_message_id is a “random string” that graylog puts in all mesages.
This is a normal message field, so its Mappings can be easily changed.

By default Mappings this field eats disk space by three ways:

Actual data, stored in a field named _source,
Inverted index of the field, allowing using this field in search (Eating 1.5% of my whole disk!),
doc_values, apparently used for tie-breaking in graylog (Eating another 1.5% of my disk).

Is this Mapping correct? Uses graylog both these disk-hungry features,
or I can disable inverted index feature and free 1.5% of my disk?

gsmith · February 21, 2022, 10:43pm

Hello,

Yes, gl2_message_id is the identifier for each unique message It will be set to a ULID during processing. . As far as I know it acts like _id .Using ULIDs results in shorter IDs (26 characters for ULID vs 36 for UUID) and thus reduced storage usage.

Alert Server Changes - Implement message ID field · Issue #5994 · Graylog2/graylog2-server · GitHub

I haven’t seen someone do that, but if you get it to work without breaking it, I would be curious to see how you went about doing it.

So on that note, In this post below I showed the _id and the gl2_message_id

As you can see I can search with both of them, I also use this ${message.id}. So again I’m not sure. To be honest I would use a dev VM and try to adjust it to your needs and see happens. By chance have you posted in GitHub about this? I would think that one of the staff members would be able to answer this question with more detail.

EDIT: I forgot to mention if gl2_message_id is a concern have you thought about creating a custom index?

`

nisow95612 · February 25, 2022, 10:43am

Thank you for the replies.

I would expect ULID to reduce storage size only in _source and only if they compress better.
But they are unique values, so I don’t expect big savings on the reversed index.

Sadly, I can’t ask on GitHub anymore. I don’t want to register for Microsoft account.

What do you mean by “custom index”? Just custom mappings on graylog indices?
Sure. Lot of my non-builtin fields use custom mapping. I even changed gl2_remote_ip to type ip so I can search by source subnet and message to match_only_text to save disk space.

gsmith · February 26, 2022, 1:32am

Hello,

Yes, I was referring to more on the aspects of creating a new index template, since Elasticsearch by default is dynamic this option can be turnoff or create a static index template /mapping. Just an idea for saving disk space.

system · March 12, 2022, 1:33am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Hidden fields of message / gl2 fields Graylog Central (peer support) pipeline-rules	3	1391	October 1, 2019
Elasticsearch Exception & Custom Index Template Graylog Central (peer support) basic-configuration	4	890	June 23, 2022
Field retype during runtime, questions about indexes/shards Graylog Central (peer support)	4	285	February 21, 2023
Error about message field Graylog Central (peer support)	7	1544	April 4, 2023
Adjusting data in raw message Graylog Central (peer support)	4	1507	May 29, 2017

Correct mapping for gl2_message_id

Related topics