I have a few extractors on a feed, getting bytes sent, received and message level. All grok patterns, all number (int). The fields get added to the stream but when I view field stats, sent and received get NaN while the level (a number that doens’t change, it’s almost always 6) works just fine. I limited the query to just where the field sent_bytes exists.
Each of the 17 results in the attachment has a value in sent and received bytes but when I create a quick values chart, it only shows 7 messages with field rcvd_bytes.
That’s a bit of a problem then no? The Grok pattern is set to numeric:int - the message fed is level=“6” bytes=“4444” - the extractor is storing the 6 as a number but the 4444 is stored as something else?
If there are other messages with the message field “rcvd_bytes” or “sent_bytes” with a different data type, then Elasticsearch will try to guess the data type based on the first message in the index with these fields.
As @jan already said, create a custom Elasticsearch index mapping for the fields you want to analyze.
The syslog feed will only ever have bytes for rcvd/sent and what the hell is the point of me telling it %{NUMBER:int} if Elasticsearch is going to guess the data type? Why not just %{Guess}. It’s pointless to include a feature where you can specify the data type if the backend is just going to guess the type to be stored based on what it sees.
Is there a way to view the schema of what elasticsearch thinks is the data type being stored?
It would indeed work, if there were only messages having that one data type for the message field in question.
That doesn’t seem to be the case in your Elasticsearch cluster.