Terrible search performance


Currently i am using a single node setup (everything is on that one node) and have no significant performance issues. Old production server have 8 vcpu-s and 64GB of vRAM and 8 SAS disks. One primary shard per index. Graylog 2.4 and elasticsearch 2.x.

Now i have prepared cluster of graylog 3 (two nodes) and elasticsearch 6.7 (three nodes, two of them data/master and one master only).
Also i’v reindexed from remote from old production to this new production.
Everything is running on centos 7.6. Everything except one elasticsearch data node is on vmware virtualization.
Graylog node i’m testing with have 16 vcpus and ~120GB of vRAM (other node which is not actively used have 8 vcpus and 64GB of vRAM)
Elasticsearch nodes:
es01 (vmware)
32 vcpu
64 vram
using volume on SAN with more then 100 SAS disks
es02 (phycical)
64 threads
64 ram
8 SAS disks
es03 (vmware, no data)
8 vcpu
64 vram

both openjdk-11 and openjdk-1.8.0 are installed, but JAVA_HOME set to openjdk-11 for elasticsearch (JAVA_HOME=/usr/lib/jvm/java-11-openjdk-
lsof -p 48818 | grep open
java 48818 elasticsearch txt REG 253,0 11480 117456399 /usr/lib/jvm/java-11-openjdk-
java 48818 elasticsearch mem REG 253,0 18100224 8536631 /usr/lib/jvm/java-11-openjdk-

It looks like problem with elasticsearch, but maybe someone can give some ideas.
New Elasticsearch environment has ~600 indices with 3 primary + 1 replica, so in total ~600*6=3600 shards and ~4.6TB(primary+replica) in total. ~ 4M-10M messages in each index (~4GB-10GB in each index).
Node which is on physical server (es02) is more or less performant though still worst then old production server. es01 performs badly on searches with very high cpu usage, load and very high ioread. Also when all nodes are running and i’m performing searches es01 ioread are tens of time larger (near 1GB/s) of that es02 have, also load is significanttly heavier. First thing to come to my mind was that elasticsearch is just using one node much more then another for some reason though number of primary and replica shards is not much different on both nodes, so i could say it is quit balanced.
I have switched of es01 for test and observed then that search is more or less performant (though still worst then on old production) and whats interesting that ioread or load did not changed on that node significantly while other data node was offline when logically it should perform all those io-s on that single node and it also should jump to ~1GB/s, so that’s probably no fault of a cluster as whole. When switching vice versa (onlu es01 online) searches was still performing very bad as with two nodes online. So it looks like that it is one node i have most of the problems with (es01).
Maybe i’m also not doing those searches in correct way because i’v imported all dashboards from old production and maybe there are some incompatibilities. Some dashboards are very slow only on that one node while other are retrieving results at acceptable speed.

EDIT: i’v also tried to describe only es02 in graylog, but that did not gave any result.
Also just few test clients are sending data to graylog inputs with few hundreds of documents per day so graylog node and elasticsearch cluster is not loaded with ingest input traffic. Also i’v added for test purposes “use_adaptive_replica_selection” : “true”.
es is running with following parameteres:
elastic+ 9309 31.4 62.7 2297341424 41300208 ? SLsl May14 319:04 /usr/lib/jvm/java-11-openjdk- -Xms30g -Xmx30g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=/tmp/elasticsearch-13924678276891718870 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Djava.locale.providers=COMPAT -XX:UseAVX=2 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -Des.distribution.flavor=oss -Des.distribution.type=rpm -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet

Almost all indices have following settings:
“graylog_555” : {
“settings” : {
“index” : {
“refresh_interval” : “30s”,
“number_of_shards” : “3”,
“provided_name” : “graylog_555”,
“creation_date” : “1556006937036”,
“analysis” : {
“analyzer” : {
“analyzer_keyword” : {
“filter” : “lowercase”,
“tokenizer” : “keyword”
“number_of_replicas” : “1”,
“uuid” : “Kgc58t7CSAaPqPtCOOPfzw”,
“version” : {
“created” : “6070199”

Can anyone suggest on any troubleshooting possibilities and well known tuning for dummies?

Thank you and sorry for my English!

es01 https://pastebin.com/CYUrNYNW
es01 https://pastebin.com/znDpPF9b
es03 _https://pastebin.com/vwa6HYLy because of newbie limit
gl01 _https://pastebin.com/e3G4jSRH because of newbie limit

you want to read: https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

1 Like

Thanks for information!
I have read it before, but altogether with other sources and suggestions, so i just put my bet on average shard setup taking in account provided default value for single node, different suggestions of number of nodes x 3 (or 2) and suggestions regarding size of index and number of documents. Also additional reason to use such setup was to have 1 shard per data node plus one for future data node so each query would load all servers.
I did not saw clear indications that my selected number of shards is incorrect in this article, each index is 2-4GB in size. Only possibly inappropriate sizing consideration was regarding suggested maximums number of shards per GB of heap memory, which is suggested to be 600 per node with size of heap memory i have on my nodes, so total of 1200 shards per two nodes (i have ~3500), though i dont know whether this metering should include replicas.
I’m possibly missing some basic principles :frowning:
I’v installed additional data node and waiting it to rebalance. I am also considering to increase number of vcpus to the same number as of es02 though it is overhead and probably wont help because old production environment had just 8 vcpus for all components.
Maybe someone can suggest some troubleshooting steps to find a reason for that one node to be many times slower than other?

Rebalanced that new node and the same weird things are going one. Load and ioreads one both virtualized data nodes are tens of time higher then on that physical one. Whether i have missed some parameter and physical server have some magical parameter different from virtualized i’m not aware of or it is some king of bug with virtualization.

Maybe there is something wrong with my guest os filesystem setups? That is i have created lvm volume directly on disk block storage on vm-s, but on physical i’v created lvm volume on partition instead and also mount parameters differs. Maybe that somehow impacts system with read overhead or this is not possible?
On physical machine it is like this
/dev/mapper/centos-srv /srv xfs rw,relatime,attr2,inode64,logbsize=256k,sunit=512,swidth=3072,noquota 0 0

└─sda3 8:3 0 4.9T 0 part
├─centos-root 253:0 0 50G 0 lvm /
├─centos-swap 253:1 0 31.4G 0 lvm [SWAP]
└─centos-srv 253:2 0 4.9T 0 lvm /srv

But on VM-s it is like this
/dev/mapper/vg_es-lv_es /srv/data xfs rw,relatime,attr2,inode64,noquota 0 0
sdb 8:16 0 4.7T 0 disk
└─vg_es-lv_es 253:2 0 4.5T 0 lvm /srv/data

if you have two nodes with 64GB RAM you can assign 32GB to the JVM HEAP for elasticsearch - that can hold ~600 shards (including replica) per node what makes a total of ~1200 shards your system can hold without making crazy things.

You have ~3500 shards what is more as what your two nodes can work with by the factor ~2.9 - so you should add 3 times the ressources you have currently to handle the amount of shards in your cluster. Speak: add 4 more nodes or reduce the shards

We do not speak on the disk space or IO here - this is just the RAM and the ability to work with the meta data.

Ok, will stop fighting for a wile. Deleted all indices and will reindex everything again with 1p+1r. That will take 9 days and hopefully will help. I will get back then and post results.
I still don’t know why physical server was loaded tens of time less then VMs… Also while it was only data node left online.

Now reindexed on one physical node (es02) with 550 shards (all primary) it’s still does not looks performant. Still performs worst then that old all in one environment with just 8 vcpus and 64GB RAM for everything (Graylog 2.4 and elasticsearch 2.x). As before ElasticHQ metrics page does not responds at all when indices number is in hundreds. Could it be that elasticsearch 6.7 performs worst then 2.x?

EDIT: Restarted cluster and tried take some ~ number of indices when ElasticHQ metrics starts to become unresponsive (after node restart elasticsearch makes some checks index by index which takes few minutes in whole and only after check makes index available). Looks like ~ on 450 indices it becomes unresponsive.

It looks that something has changed in search behavior a lot from 2.x ES version to 6.7. Not configuration or virtual/physical differences as i thought all the time (not that i had excluded possibility that it is version difference issue but i did not looked in that direction much). Still i did not tested with cluster of 2.x with multiple nodes to be 100% sure that it was the case.
I’v resolved an issue i was aware before of but did not paid attention as it was working on old production with no problems and it has helped much. That is some clients with incorrect time in future/past was hitting graylog and that’s why indices that was created per day was garbaged with docs from different dates and as the result searches was hitting 15 times more indices than needed. That was especially badly influencing dashboards which have many widgets and as the result search queries which needs frequent refreshes. Again that was not a problem with 2.x ES node. I’v deleted all docs form those clients with incorrect time that was doing most harm.
I cant be sure that i did everything correct and that there are no other problems which maybe would make everything work multiple times faster without cleaning up those garbaged indices but at least it performs close to that old setup.

Does anyone knows if it is possible to recalculate index range for active write index? Because if i’m reindexing from remote and then fire up graylog this index does not have an index range.

yes - you can recalculate for every index via API or UI.

There is no recalculate button for active write index, at least I don’t see it, but I will try via api then Thanks.!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.