GROK extractor field data types and how they relate to ElasticSearch field data types

TJgrayD · June 11, 2018, 6:13pm

I submitted this question a few weeks ago, and got no response, and the thread was auto closed (I don’t really understand why that’s a useful thing… but not my board:)
Hi, I’m seeing some discrepency between the data field type I set in my grok extractor, compared to how the index gets created in elasticsearch. I do not have any custom mappings for this entire index.

When I set “int” as the field data type in my grok extractor, that appears to be ignored, and the index field type gets created as “long”. In another case (not the extractor below) I set it to int, and the elasticsearch index was created as type “keyword” (yes I rotated the index after making changes).

My question is, if I use one of the field types listed in this doc, how does that relate to how the index field types are configured in elasticsearch? It seems like there is a “loose” correlation, but not a strict one. The “float” data type seems to be passed through from grok extractor to elasticsearch index.

I know I can configure a custom mapping, but I’d prefer to use those as little as possible, and instead put the data type in the grok so my users can create groks without my assistance making changes directly to elasticsearch.

Here is the grok extractor in question. The problematic fields with inconsistency in this case are “mysql_rows_sent, and mysql_rows_examined”.

^# User@Host: %{NOTSPACE:mysql_user} @ %{HOSTNAME:mysql_hostname}%{GREEDYDATA}\n# Query_time: %{BASE16FLOAT:mysql_query_time;float}\s+Lock_time: %{BASE16FLOAT:mysql_lock_time;float}\s+Rows_sent: %{BASE10NUM:mysql_rows_sent;int}\s+Rows_examined:\s+%{BASE10NUM:mysql_rows_examined;int}\nSET timestamp=%{BASE10NUM:timestamp};\n%{GREEDYDATA:mysql_query;string}

05/16 17:45[root@admin3]# curl -XGET localhost:9200/graylog_19/_mapping/field/mysql_*?pretty
    {
      "graylog_19" : {
        "mappings" : {
          "message" : {
            "mysql_lock_time" : {
              "full_name" : "mysql_lock_time",
              "mapping" : {
                "mysql_lock_time" : {
                  "type" : "float"
                }
              }
            },
            "mysql_query_time" : {
              "full_name" : "mysql_query_time",
              "mapping" : {
                "mysql_query_time" : {
                  "type" : "float"
                }
              }
            },
            "mysql_rows_sent" : {
              "full_name" : "mysql_rows_sent",
              "mapping" : {
                "mysql_rows_sent" : {
                  "type" : "long"
                }
              }
            },
            "mysql_rows_examined" : {
              "full_name" : "mysql_rows_examined",
              "mapping" : {
                "mysql_rows_examined" : {
                  "type" : "long"
                }
              }
            },
            "mysql_user" : {
              "full_name" : "mysql_user",
              "mapping" : {
                "mysql_user" : {
                  "type" : "keyword"
                }
              }
            },
            "mysql_query" : {
              "full_name" : "mysql_query",
              "mapping" : {
                "mysql_query" : {
                  "type" : "keyword"
                }
              }
            },
            "mysql_hostname" : {
              "full_name" : "mysql_hostname",
              "mapping" : {
                "mysql_hostname" : {
                  "type" : "keyword"
                }
              }
            }
          }
        }
      }
    }

jochen · June 12, 2018, 7:41am

It depends. If you don’t have any custom index mappings, Elasticsearch will try to guess the type of each field when it’s created.

If your grok patterns are the only source of input, then the data type you’ve provided in your grok patterns will be used.
If there are other messages with different data types for certain fields are indexed into Elasticsearch first, then their data type for the respective message field will be used.

tl;dr: Create custom index mappings if you want to make sure that certain message fields always have a well-defined data type.
http://docs.graylog.org/en/2.4/pages/configuration/elasticsearch.html#custom-index-mappings

TJgrayD · June 12, 2018, 5:18pm

Thanks Jochen,

So if I’m specifying a field type in the GROK pattern, and that field name hasn’t previously existed with another field type (or I’ve rotated my index since then), but the field type is NOT getting set in elasticsearch, that would be a graylog bug, correct? If so, I’ll file a bug report.

jochen · June 13, 2018, 7:56am

Please create bug report at Issues · Graylog2/graylog2-server · GitHub and provide all necessary information to reproduce the issue (such as the complete Grok pattern and the patterns it depends on, some example messages, the configuration of the Grok extractor or the pipeline rule, and the Elasticsearch index mappings and templates).

system · June 27, 2018, 7:58am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GROK extractor field data types in relation to ElasticSearch field data types Graylog Central (peer support)	1	838	May 30, 2018
Supported Elasticsearch Field Types Graylog Central (peer support)	6	1797	February 20, 2020
Extractor key=value field type hints Graylog Central (peer support) pipeline-rules	4	1663	June 5, 2020
About data types in graylog and ES Graylog Central (peer support) pipeline-rules	2	2375	April 4, 2018
Custom mapping not applied to indices Graylog Central (peer support)	5	3126	December 3, 2019

GROK extractor field data types and how they relate to ElasticSearch field data types

Related topics