Very slow dashboards

Whew! On its face, the information you’ve presented leads me to think that this is a case of oversharding. I mean, 20K shards is A LOT of shards and there’s overhead with each one of those. Our recommendations mirror those of Elasticsearch: that your shards be between 10-50GB and that your shards stay within a 20:1/shards:gb of cluster heap ratio.

I’d hazard a guess that the oversharding is likely leading to all of your heap being consumed and that were I to look at your indices stats, that shard operations are taking days/weeks to complete versus being done in minutes. So basically, Elasticsearch is starved for resources.

Most of what I’m going to recommend comes down to tuning your index sets. In general, it’s better to have fewer, larger indices (keep in mind that their shards should be below 50GB) than it is to have many smaller indices. So in the example index set you provided, you have your shards configured as:

Shards: 5
Replicas: 2

That’s resulting in 2 replica shards per primary shard, giving you a total of 15 total shards per index. You’re keeping 400 indices, so 15*400=6000 shards. That means that you’ll have to have 300GB of heap in your cluster to reliably support that amount of shards. Based on the specs you provided for the cluster, you only have 150GB of heap in the cluster, so you’re definitely oversubscribed in terms of shards. Now there’s a bit of play that you have in terms of your heap. Elasticsearch has long recommended that folks don’t go above 32GB when it comes to setting heap due to compressed object pointers in Java. HOWEVER, there is a threshold (>48GB) where the performance hit you take by going over 32GB can be overcome. But keep in mind, that’s not generally recommended.

With all of that said, I think the messages you’re seeing in the logs are indicative of the lack of resources in Elasticsearch, so once you tune your index sets and get your shards under control, the dashboards should be more responsive.

2 Likes