Terrible search performance



Currently i am using a single node setup (everything is on that one node) and have no significant performance issues. Old production server have 8 vcpu-s and 64GB of vRAM and 8 SAS disks. One primary shard per index. Graylog 2.4 and elasticsearch 2.x.

Now i have prepared cluster of graylog 3 (two nodes) and elasticsearch 6.7 (three nodes, two of them data/master and one master only).
Also i’v reindexed from remote from old production to this new production.
Everything is running on centos 7.6. Everything except one elasticsearch data node is on vmware virtualization.
Graylog node i’m testing with have 16 vcpus and ~120GB of vRAM (other node which is not actively used have 8 vcpus and 64GB of vRAM)
Elasticsearch nodes:
es01 (vmware)
32 vcpu
64 vram
using volume on SAN with more then 100 SAS disks
es02 (phycical)
64 threads
64 ram
8 SAS disks
es03 (vmware, no data)
8 vcpu
64 vram

both openjdk-11 and openjdk-1.8.0 are installed, but JAVA_HOME set to openjdk-11 for elasticsearch (JAVA_HOME=/usr/lib/jvm/java-11-openjdk-
lsof -p 48818 | grep open
java 48818 elasticsearch txt REG 253,0 11480 117456399 /usr/lib/jvm/java-11-openjdk-
java 48818 elasticsearch mem REG 253,0 18100224 8536631 /usr/lib/jvm/java-11-openjdk-

It looks like problem with elasticsearch, but maybe someone can give some ideas.
New Elasticsearch environment has ~600 indices with 3 primary + 1 replica, so in total ~600*6=3600 shards and ~4.6TB(primary+replica) in total. ~ 4M-10M messages in each index (~4GB-10GB in each index).
Node which is on physical server (es02) is more or less performant though still worst then old production server. es01 performs badly on searches with very high cpu usage, load and very high ioread. Also when all nodes are running and i’m performing searches es01 ioread are tens of time larger (near 1GB/s) of that es02 have, also load is significanttly heavier. First thing to come to my mind was that elasticsearch is just using one node much more then another for some reason though number of primary and replica shards is not much different on both nodes, so i could say it is quit balanced.
I have switched of es01 for test and observed then that search is more or less performant (though still worst then on old production) and whats interesting that ioread or load did not changed on that node significantly while other data node was offline when logically it should perform all those io-s on that single node and it also should jump to ~1GB/s, so that’s probably no fault of a cluster as whole. When switching vice versa (onlu es01 online) searches was still performing very bad as with two nodes online. So it looks like that it is one node i have most of the problems with (es01).
Maybe i’m also not doing those searches in correct way because i’v imported all dashboards from old production and maybe there are some incompatibilities. Some dashboards are very slow only on that one node while other are retrieving results at acceptable speed.

EDIT: i’v also tried to describe only es02 in graylog, but that did not gave any result.
Also just few test clients are sending data to graylog inputs with few hundreds of documents per day so graylog node and elasticsearch cluster is not loaded with ingest input traffic. Also i’v added for test purposes “use_adaptive_replica_selection” : “true”.
es is running with following parameteres:
elastic+ 9309 31.4 62.7 2297341424 41300208 ? SLsl May14 319:04 /usr/lib/jvm/java-11-openjdk- -Xms30g -Xmx30g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.io.tmpdir=/tmp/elasticsearch-13924678276891718870 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/elasticsearch -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m -Djava.locale.providers=COMPAT -XX:UseAVX=2 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -Des.distribution.flavor=oss -Des.distribution.type=rpm -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet

Almost all indices have following settings:
“graylog_555” : {
“settings” : {
“index” : {
“refresh_interval” : “30s”,
“number_of_shards” : “3”,
“provided_name” : “graylog_555”,
“creation_date” : “1556006937036”,
“analysis” : {
“analyzer” : {
“analyzer_keyword” : {
“filter” : “lowercase”,
“tokenizer” : “keyword”
“number_of_replicas” : “1”,
“uuid” : “Kgc58t7CSAaPqPtCOOPfzw”,
“version” : {
“created” : “6070199”

Can anyone suggest on any troubleshooting possibilities and well known tuning for dummies?

Thank you and sorry for my English!

es01 https://pastebin.com/CYUrNYNW
es01 https://pastebin.com/znDpPF9b
es03 _https://pastebin.com/vwa6HYLy because of newbie limit
gl01 _https://pastebin.com/e3G4jSRH because of newbie limit

(Jan Doberstein) #2

you want to read: https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster


Thanks for information!
I have read it before, but altogether with other sources and suggestions, so i just put my bet on average shard setup taking in account provided default value for single node, different suggestions of number of nodes x 3 (or 2) and suggestions regarding size of index and number of documents. Also additional reason to use such setup was to have 1 shard per data node plus one for future data node so each query would load all servers.
I did not saw clear indications that my selected number of shards is incorrect in this article, each index is 2-4GB in size. Only possibly inappropriate sizing consideration was regarding suggested maximums number of shards per GB of heap memory, which is suggested to be 600 per node with size of heap memory i have on my nodes, so total of 1200 shards per two nodes (i have ~3500), though i dont know whether this metering should include replicas.
I’m possibly missing some basic principles :frowning:
I’v installed additional data node and waiting it to rebalance. I am also considering to increase number of vcpus to the same number as of es02 though it is overhead and probably wont help because old production environment had just 8 vcpus for all components.
Maybe someone can suggest some troubleshooting steps to find a reason for that one node to be many times slower than other?


Rebalanced that new node and the same weird things are going one. Load and ioreads one both virtualized data nodes are tens of time higher then on that physical one. Whether i have missed some parameter and physical server have some magical parameter different from virtualized i’m not aware of or it is some king of bug with virtualization.


Maybe there is something wrong with my guest os filesystem setups? That is i have created lvm volume directly on disk block storage on vm-s, but on physical i’v created lvm volume on partition instead and also mount parameters differs. Maybe that somehow impacts system with read overhead or this is not possible?
On physical machine it is like this
/dev/mapper/centos-srv /srv xfs rw,relatime,attr2,inode64,logbsize=256k,sunit=512,swidth=3072,noquota 0 0

└─sda3 8:3 0 4.9T 0 part
├─centos-root 253:0 0 50G 0 lvm /
├─centos-swap 253:1 0 31.4G 0 lvm [SWAP]
└─centos-srv 253:2 0 4.9T 0 lvm /srv

But on VM-s it is like this
/dev/mapper/vg_es-lv_es /srv/data xfs rw,relatime,attr2,inode64,noquota 0 0
sdb 8:16 0 4.7T 0 disk
└─vg_es-lv_es 253:2 0 4.5T 0 lvm /srv/data

(Jan Doberstein) #6

if you have two nodes with 64GB RAM you can assign 32GB to the JVM HEAP for elasticsearch - that can hold ~600 shards (including replica) per node what makes a total of ~1200 shards your system can hold without making crazy things.

You have ~3500 shards what is more as what your two nodes can work with by the factor ~2.9 - so you should add 3 times the ressources you have currently to handle the amount of shards in your cluster. Speak: add 4 more nodes or reduce the shards

We do not speak on the disk space or IO here - this is just the RAM and the ability to work with the meta data.


Ok, will stop fighting for a wile. Deleted all indices and will reindex everything again with 1p+1r. That will take 9 days and hopefully will help. I will get back then and post results.
I still don’t know why physical server was loaded tens of time less then VMs… Also while it was only data node left online.