Extractor key=value field type hints

I’ve recently set up an extractor for haproxy and nginx logs using the k=v extractor. This works fine, but many of the fields are actually not string data, such as IP4/IP6 addresses, or HTTP status codes, duration, timestamps etc, and the extractor only creates them as string types.

How can I keep the simplicity of the k=v extractor, and also provide type hints to the extractor so that we can later (for example) query on duration:>100 or http_status_code:>420 for example, or GeoIP lookups?

he @skunkwerks

it is not important how Graylog extracts the data - because the type is created when the data is first written to elasticsearch.

In Graylog the data is typeless that is the reason you need to define the type when you work with the data in the processing pipeline.

So elasticsearch guess the type of data when the field is generated. Means on first ingest. To have always (forced) a specific field you might want to create a custom elasticsearch mapping ( https://docs.graylog.org/en/3.3/pages/configuration/elasticsearch.html#custom-index-mappings ) or you create processing pipeline rules that are forcing specific content in those for you important fields.

thanks for the clarificaton! Can you elaborate on that a bit please:

  1. after deleting & updating this index mapping in ES, does old data get re-indexed? or is it only going to apply to new data?

  2. if I address this in pipeline rules, and convert the datatype, presumably this also needs to match what ES is expecting?

Hi @skunkwerks

You need something like this for custom-index-mappings:

{
  "template": "graylog_*",
  "mappings" : {
    "message" : {
      "properties" : {
        "level" : {
          "type" : "keyword"
        },
        "duration" : {
          "type" : "long"
        },
        "http_status_code" : {
          "type" : "long"
        }
      }
    }
  }
}
  1. This will only work to new indices, you will need to rotate active index on Graylog.
  2. Yes, pipeline rules needs to match with ES.