I’ve recently set up an extractor for haproxy and nginx logs using the k=v extractor. This works fine, but many of the fields are actually not string data, such as IP4/IP6 addresses, or HTTP status codes, duration, timestamps etc, and the extractor only creates them as string types.
How can I keep the simplicity of the k=v extractor, and also provide type hints to the extractor so that we can later (for example) query on duration:>100 or http_status_code:>420 for example, or GeoIP lookups?
it is not important how Graylog extracts the data - because the type is created when the data is first written to elasticsearch.
In Graylog the data is typeless that is the reason you need to define the type when you work with the data in the processing pipeline.
So elasticsearch guess the type of data when the field is generated. Means on first ingest. To have always (forced) a specific field you might want to create a custom elasticsearch mapping ( https://docs.graylog.org/en/3.3/pages/configuration/elasticsearch.html#custom-index-mappings ) or you create processing pipeline rules that are forcing specific content in those for you important fields.