Graylog web interface is slow after upgrade

rfinney · September 28, 2017, 5:43pm

Hi,

We have a 3 node graylog cluster that I upgraded from 2.2.3 to 2.3.1. After the upgrade the web interface was noticeably slower. Especially so on the Search page and the Sources page.

Last time I tried loading the Search page it took 12 seconds but the search result is saying it found Found 39,207 messages in 342 ms, searched in 231 indices for the last 5 minutes. The Sources page is taking roughly 6 seconds to load and the two graphs have the spinning icon until it loads. Both these pages loaded almost instantaneously before the upgrade.

I thought it might be due to the load balancer haproxy that points to nginx. I took both out of the equation and the speeds remained the same.

I then upgraded elasticsearch from 2 to 5. Still the same result.

Our setup is as follows:

3 graylog VMs running Graylog and Mongodb
20 CPUs
14GB of ram
Centos 7
Graylog 2.3.1+9f2c6ef o
Linux 3.10.0-693.2.2.el7.x86_64)
Oracle Corporation 1.8.0_144
openjdk version "1.8.0_144"
OpenJDK Runtime Environment (build 1.8.0_144-b01)
OpenJDK 64-Bit Server VM (build 25.144-b01, mixed mode)

3 elasticsearch VM nodes
20 CPUs
25 GB of ram
12 GB java heap
Centos 7
Linux 3.10.0-693.2.2.el7.x86_64)
elasticsearch
"number" : “5.6.2”,
“build_hash” : “57e20f3”,
“build_date” : “2017-09-23T13:16:45.703Z”,
“build_snapshot” : false,
“lucene_version” : “6.6.1”

The Indices/Index are set to
Shards: 4
Replicas: 2
Each elasticsearch node has roughly 250GB of data.

The hardware behind this setup is brand new and not being taxed at all. The SAN iops are hardly being touched. This setup is only receiving roughly 100 messages/sec.

Here is a link to our config for the master graylog node. The other 2 are identical except where node1 needs to be node2, etc… And only node1 is master.

https://pastebin.com/wGpbbkjU

Any ideas on what to do?

Thanks,
Ryan

rfinney · September 28, 2017, 7:23pm

One thing that I’ve noticed is that when I load one of those pages and watch top on the command line I see mongod jump to 50% plus until the page loads. Not sure if that’s normal.

Checked the elasticsearch heap and it shows the following which seams fine to me.

curl -sS -XGET “localhost:9200/_cat/nodes?h=heap*&v”
heap.current heap.percent heap.max
3.2gb 27 11.8gb
5.2gb 44 11.8gb
3.7gb 31 11.8gb

Noticed oom-killer logs seen here.

I noticed these in my logs with 1.8.0.144 that I never noticed before. I rolled back to 141 and will check of it helped tomorrow.

Edit: It did help with the oom-messages.

Didn’t help page load speed.

rfinney · September 30, 2017, 12:10am

Completely cloned our environment to Dev and isolated the environment to it’s own closed network.

First thing I tried is set all the indexes to only keep one and rotated them to clear out almost all data except for the data from the closed environment feeding in still. Didn’t see much difference in performance.

Rolled back the changes and tried downgrading mongodb. Didn’t notice any substantial difference. I didn’t notice mongod spiking on page load but that might be because the dev system is relatively idle.

Disabled tls, nginx, haproxy and loaded right from the graylog http page. No difference.

Tried reinstalling the Graylog rpm, no change.

I’m not seeing anything in the logs, the buffers are always empty, not sure what else to look for.

It seems like the graph processing on the search and sources page may be the culprit, but I could be off base.

Hoping @Jan, @jochen, or one of the other awesome people here have some insight on what to do next.

Edit: I believe it’s something to do with the calls being made on those pages, possibly something to do with the 4096 regression in the prior 3.0 release and how it was fixed. I tried rolling back to that release but it had the 4096 errors. I noticed on the current release when I search in a stream everything loads quick including the graphs.

github.com/Graylog2/graylog2-server

Search query fails with large number of indices

opened 08:34AM - 03 Aug 17 UTC

closed 02:01PM - 11 Aug 17 UTC

joschi

elasticsearch bug blocker

Graylog (or rather Jest) is sending HTTP requests with a large initial line (URI… path and query string) to the Elasticsearch HTTP API if a large number of indices is included in the search query (e. g. when searching in "All messages"). Related topic: https://community.graylog.org/t/unable-to-search-in-all-messages/1922 ## Expected Behavior Search queries covering a lot of indices should work. ## Current Behavior Search queries covering a lot of indices fail with an internal server error (HTTP 500) and produce an error message in the Elasticsearch logs: ``` [WARN ][http.netty ] [ElasticsearchNodeName] Caught exception while handling client http traffic, closing connection [id: 0xecc07e39, /10.1.2.3:54321 => /10.1.2.3:9200] org.jboss.netty.handler.codec.frame.TooLongFrameException: An HTTP line is larger than 4096 bytes. ``` ## Possible Solution Patch Jest to send index names in the POST body. ## Steps to Reproduce (for bugs) 1. Create lots of indices (so that the list of index names is longer than 4 KB) 2. Run search query covering all indices 3. ??? 4. Profit! ## Your Environment * Graylog Version: 2.3.0 * Elasticsearch Version: 2.x, 5.x

jan · October 2, 2017, 6:50am

Hej Folks,

I had checked my Lab and the following Version of (openJDK) are installed, but I did not see any errors. But I could notice that the Interface feels slower. From time to time. Will check if this depends on the Host in the Cluster where the LB connects me to and if that feeling is different between the 3 Servers.

Thank that you bring this is up we will investigate.

Ubuntu 16.4 LTS

java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

Debian 8

openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2~bpo8+1-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)

CentoOS 7

openjdk version "1.8.0_144"
OpenJDK Runtime Environment (build 1.8.0_144-b01)
OpenJDK 64-Bit Server VM (build 25.144-b01, mixed mode)

rfinney · October 2, 2017, 3:51pm

Thanks Jan! If it is a code issue and you need someone to test, I have a whole dev environment now.

I tried rolling back the Java version for the Graylog nodes from 141 to 131 but that didn’t help.

I may stand up a Debian node and connect it to the elastic cluster and see if I get a performance difference.

rfinney · October 12, 2017, 8:14pm

Haven’t had a chance to setup a debian node to test. Still experiencing slowness.

rfinney · October 19, 2017, 9:46pm

@jan Upgraded to the latest version today, still no change. I did notice that on the Search page and Source page it will take 15 seconds to load, but in a stream I can run a search for the past 30 days and return the following with a 3 second page load.

Found 107,573,655 messages in 591 ms, searched in 23 indices.
Results retrieved at 2017-10-19 17:30:11.

I’m wondering if it has something to do with my streams. We have a little under 50 streams each with its own index.

Edit: Going to try changing my ideces that had been set to 30 day to 1 day and see if that makes any difference.

rfinney · October 20, 2017, 3:05pm

Changing the indeces may have helped take off a second at most.

Installed the Graylog MongoDB plugin and checked the query times. Most returned 0ms. Slowest returned .036ms.

rfinney · October 20, 2017, 5:21pm

I am seeing

017-10-20T12:55:16.462-04:00 ERROR [UsageStatsClusterPeriodical] Uncaught exception in periodical
org.graylog2.indexer.ElasticsearchException: Fetching message count failed for indices [2n-devices_1, 2n-devices_8, 2n-devices_1
…
An HTTP line is larger than 4096 bytes.

I tried setting http.max_initial_line_length: 64k in /etc/syconfig/elasticsearch but that doesn’t appear to have worked as it might not be the right place to set it. I tried setting it in /etc/elasticsearch/elasticsearch.yml but then elasticsearch doesn’t want to start.

Does anyone know how to set this on a centos/rhel system? Running elasticsearch 5.6.

jochen · October 20, 2017, 9:15pm

This will be fixed in Graylog 2.4.0:

rfinney · October 20, 2017, 11:14pm

Thanks Jochen. Do you know if this might explain the weirdness I’m seeing with slowness mentioned above?

It’s weird because in a stream searching works as expected quickly. But from the search tab it’s 15 seconds to load. I’ve been wondering if this is related to the thread you mentioned that I’ve been following.

jochen · October 21, 2017, 7:54am

Maybe you could check the Network tab of the Developer Console of your web browser to find out which request takes longest.

rfinney · October 25, 2017, 1:36pm

I had used the dev tools from chrome and ff a while back but didn’t see anything standing out. The one thing that appeared to be the slowest was the bar chart. If I loaded the page I would see the search results show up in 4 seconds and the bar chart might take 12 seconds to finish.

This seems to be resolved now. I ended up basing our indeces on size for rotation and greatly reduced the total number we had. This was something I had been meaning to do regardless of the issue we had to avoid something like a dos filling up our disks.

Reducing the total number of indices seems to have made the biggest impact and things seem to be working at regular speeds now. It is strange that we didn’t have this issue before the upgrade which makes me believe it has to do with how the calls are being made over http to elasticsearch.

system · November 8, 2017, 1:36pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loading Forever in search Graylog Central (peer support)	16	5443	January 4, 2018
Graylog 4 web interface very slow on all pages Graylog Central (peer support)	12	2347	January 14, 2021
Can't Connect to ElasticSearch? Graylog Central (peer support)	18	4974	August 29, 2017
Very slow dashboards Graylog Tech Challenges	26	4245	August 22, 2021
Graylog 2.3.1: "Show received message" loading forever Graylog Central (peer support)	2	4189	October 29, 2017

Graylog web interface is slow after upgrade

Ubuntu 16.4 LTS

Debian 8

CentoOS 7

Related topics