I need to proceed Nginx access logs and try to understand how to handle fields which can be presented as a number and as string.
Example: upstream_response_length can be “123” or “-” if upstream response was not used.
Purpose (as I understand it currently):
if field is a number I must be able to use math operations (like max,min, mean, etc) on this field.
if field is a string, record still must be searchable by other existing fields.
After playing with grok patterns I came to extractor like this:
“(%{BASE10NUM:upstream_response_length;int}|-)”
the mapping in Elasticsearch can be only either-or - I would solve this situation with the processing pipeline. Having one field that holds the number (the length) to be able to make math operations. With a processing pipeline, test if that field contain a number and if not, write a zero into this field and write a second field that holds the string information (if needed) or a second field that can be used as an indicator that the field doens’t had a number value …