ES fielddata java exception has broken graylog

jason · April 23, 2019, 8:19am

Hi there

After many months of problem-free working, I’ve now got a problem that has blocked basically all use of graylog for search.

Now, any time I try to do a “Quick Value” sort on some data, I get the dreaded red popup at the bottom and the ES cluster starts reporting

Caused by: java.lang.IllegalArgumentException: Fielddata is disabled on text fields by default. Set fielddata=true on [application_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.

(that is an example of me trying to sort some data by the “application_name” field)

I’ve now altered the “/_template/graylog-custom-mapping” mapping to add “fielddata: true” to that field (and a couple of others causing the same error), confirmed with curl that the change occurred, and rotated the index, but even though it’s now been 4 hours since doing this and the system has itself rotated to a new index, it still can’t sort on “application_name” even over the last five minute period. That doesn’t sound right?

If I look at “/system/index_sets”, I also notice that now the old indices don’t appear correct time-wise. I’m pushing syslog+GELF data into graylog and I’d expect the “newest” older index to show something like “Contains messages from 2 hours ago up to in 5 hours”, but instead the first 20+ indices all say “Contains messages from 2 months ago up to in 6 months”. ie the timestamps seems completely wrong.

Actually, I just ran a standard search over the past 5 minutes, and now notice the comment “Search result Found 975,472 messages in 761 ms, searched in 655 indices.”. 655 indices? Surely that should be 1 or maybe 2 indices?

Any ideas what’s gone wrong? These are CentOS-7 systems running graylog-server-3.0.1-2.noarch and elasticsearch-5.6.16-1.noarch (4-node cluster) from official repos.

Thanks

Jason

jason · April 25, 2019, 10:51am

There must be something wrong with these indices: check out this one created yesterday (shown on the graylog “/system/index_sets/xxxxx” page)

graylog_10154 Contains messages from 49 years ago up to in 6 months

The Inputs are syslog and GELF messages and so I’d expect it to cover a few hours from yesterday - but if we had some corrupt GELF data coming in with bad timestamps, I can imagine them showing up as “0” as that’s 49 years ago (ie Jan 1970). But the “up to in 6 months” is plain IMPOSSIBLE. There would definitely be legitimate data with yesterday’s timestamp in there - even if only one record - so how can that index claim to not have anything newer than 6 months old???

Any ideas? Thanks

Jason

jan · April 29, 2019, 6:38am

I guess that your ES ressources (speak RAM) is not enough anymore to hold your systems meta data.

Maybe this posting will help you: https://www.elastic.co/de/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

jason · April 30, 2019, 8:36am

OK… Well the systems are already at 64G RAM each with ES_JAVA_OPTS="-Xms31g -Xmx31g", so am I correct in saying my options are to either add more cluster nodes (so that the average number of shards/node is reduced) - or to reduce the max number of indices (ie data) I’m willing to keep?

What I don’t understand is that if this was a RAM starvation issue, why does that have anything to do with this weird index behaviour? ie why is graylog searching through 600 40G indices (in my case) for records that were added in the past 5 minutes? (ie all that data would either be in the current index or maybe the one before it). By always searching through basically every index for every search term, it’s no wonder it’s struggling for resources. What I don’t get is that it didn’t behave like this before…

Thanks

Jason

jan · April 30, 2019, 10:55am

Graylog will search all indices that can hold data for the time period you are searching in.

It has performed a “min/max” query on all indices and knows the range that each index holds. When you get a 5 minute search over hundreds of indices that indicates that you have messages that have wrong timestamps in them and let Graylog think they might have data it needs to search in.

jason · April 30, 2019, 1:36pm

Yeah I believe that to be true, but what I don’t get is that within the graylog “System->Indices” area, which shows each index, the details dropdown for the newest non-active index shows “Contains messages from 2 months ago up to in 6 months” instead of “Contains messages from 1 hour ago up to in 6 months”. I know for a fact data with a current timestamp is entering successfully (doing a simple search shows that is true), so how can the newest indices claim they only contain old data? Even a single correct record should stop that being the case.

Not a single index claims to contain data from the past few weeks, and yet graylog search still works - it just reads a tonne more indices than it should

jan · April 30, 2019, 7:08pm

Contains messages from 2 months ago up to in 6 months

please read that sentence carefully again:

Contains messages from 2 months ago up to in 6 months

It holds data from 2 month in the past and 6 months in the future …

jason · April 30, 2019, 9:40pm

d’uh! Yeah… Thanks for that - I red that totally differently - but you are correct

system · May 14, 2019, 9:50pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Field data errors Graylog Central (peer support)	2	4656	February 5, 2018
Graylog 3.1.3 Extended search View - Unable to perform search query Fielddata is disabled on text fields by default Graylog Central (peer support)	14	5695	February 11, 2020
Sources Page - Could not load sources data - after Upgrade Graylog and Elasticsearch Graylog Central (peer support)	4	1615	September 11, 2017
Search is empty even elasticsearch has data in index Graylog Central (peer support)	10	4267	November 6, 2018
Graylog search Stuck at 23:23 (can't search past 23:23) Graylog Central (peer support)	3	1008	October 3, 2017

ES fielddata java exception has broken graylog

Related topics