Hey guys,
I’m mostly rubber-duckying here, apologies if the thread is mostly me answering my own questions…
Situation
Last week our Graylog cluster was puttering along just fine, everything fine and dandy. This morning, I come in to find no less than 38k indexer failures on the System > Overview.
Troubleshooting
The indexer failures details tell me that all of the errors revolve around one issue:
type: mapper_parsing_exception, reason: failed to parse [winlogbeat_event_data_param2], cause by: {type:illegal_argument_exception, reason: invalid format}
Looking at the Graylog statistics dashboard I made for my team, I see that Graylog was receiving ~48k messages per hour until Sunday 00:00. After that it exploded into ~110k an hour.
To me this suggests that my indexes rolled over on Sunday morning, resetting the field types in the index after which ElasticSearch would start dynamic mapping anew. @jochen’s reply in the thread linked below have been very helpful in coming to that conclusion.
Hypothesis
Elasticsearch does not appear to have the Winlogbeat templates pre-loaded. This idea is further solidified by checking the following, which only shows the default Graylog templates:
curl -u $GRAYLOGUSER --cacert $CACERT https://elastichost:9200/_template/?pretty
I don’t know why the Winlogbeat template isn’t in there because I had assume that Graylog would load that in because Graylog comes with Winlogbeat receivers by default. I had assumed that setting up Graylog would also make sure that Elastic knew what Graylog’s default supported output means.
But you know what they say about assumptions.
I’ll be back later… First I’ll try to figure out where to get a recent Winlogbeat template and how to fudge it into Elastic.