Hi all, Graylog was working about six months ago but now the web UI is unresponsive for long periods. Every now and then it will let me log in.
Ubuntu 18.04
Graylog 3.3.16
Initially when I ssh’ed onto it, the boot volume was full, so I freed up some space. That didn’t fix the problem so I tried rebooting the server, then updating from 3.3.6 to 3.3.16 (had to for Log4j vulnerability, which was the reason I checked the server initially).
What diagnostic tests can I do to see what is stopping the UI from working? Maybe I need more resources assigned to the process?
**ubuntu@SystemsLoggingGraylog-Live** : **~** $ tail -f /var/log/graylog-server/server.log
at org.graylog2.periodical.IndexRotationThread.doRun(IndexRotationThread.java:73) [graylog.jar:?]
at org.graylog2.plugin.periodical.Periodical.run(Periodical.java:77) [graylog.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
2021-12-17T12:34:39.699Z ERROR [Messages] Bulk indexing failed: no write index is defined for alias [graylog_deflector]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index, retrying (attempt #88)
2021-12-17T12:34:44.097Z ERROR [Messages] Bulk indexing failed: no write index is defined for alias [graylog_deflector]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index, retrying (attempt #88)
2021-12-17T12:34:47.364Z ERROR [IndexRotationThread] Couldn't point deflector to a new index
org.graylog2.indexer.ElasticsearchException: Couldn't remove alias graylog_deflector from indices [graylog_5]
blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];
at org.graylog2.indexer.cluster.jest.JestUtils.specificException(JestUtils.java:110) ~[graylog.jar:?]
at org.graylog2.indexer.cluster.jest.JestUtils.execute(JestUtils.java:60) ~[graylog.jar:?]
at org.graylog2.indexer.cluster.jest.JestUtils.execute(JestUtils.java:65) ~[graylog.jar:?]
at org.graylog2.indexer.indices.Indices.removeAliases(Indices.java:661) ~[graylog.jar:?]
at org.graylog2.indexer.MongoIndexSet.cleanupAliases(MongoIndexSet.java:352) ~[graylog.jar:?]
at org.graylog2.periodical.IndexRotationThread.checkAndRepair(IndexRotationThread.java:149) ~[graylog.jar:?]
at org.graylog2.periodical.IndexRotationThread.lambda$doRun$0(IndexRotationThread.java:76) ~[graylog.jar:?]
at java.lang.Iterable.forEach(Iterable.java:75) [?:?]
at org.graylog2.periodical.IndexRotationThread.doRun(IndexRotationThread.java:73) [graylog.jar:?]
at org.graylog2.plugin.periodical.Periodical.run(Periodical.java:77) [graylog.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
2021-12-17T12:34:47.366Z WARN [IndexRotationThread] Deflector is pointing to [gl-events_11], not the newest one: [gl-events_12]. Re-pointing.
2021-12-17T12:34:47.367Z ERROR [IndexRotationThread] Couldn't point deflector to a new index
org.graylog2.indexer.ElasticsearchException: Couldn't switch alias gl-events_deflector from index gl-events_11 to index gl-events_1
This error indicates that you have a problem in Elasticsearch - it usually means you have reached the highwatermark for disk space. Here is a link I found that talks about correcting it. Make sure you have checked and solved disk space issues too!
I used the </> forum tool to format your logs to be a bit more readable.
According to this post, I have to run a curl command to set the indices to read write again having freed up disk space as they won’t do it by themselves.
When I run that command though, it times out, because the web interface is not responding. Catch 22!
We upgraded from 2.5.2 to 3.3.16 yesterday and are also experiencing poor GUI performance, although in our case I’ve at least isolated it to data which resides in a index set with > 1500 total daily fields vis another index set with < 120. A noticed change was in the number of fields listed given a basic search, previously it was showing only those included in the current stream/search parameters, now it’s showing ALL fields in that index-set. We’ve been working to massage our data modeling so as to reduce field count/usage, but that’s a long term project. As of the upgrade yesterday attempting to search/filter against any stream in the default index set (> 1500 fields) is slow with frequent periods of being unresponsive. Yeah just checked again, the fields selector default is to only list fields from currently selected streams, but what it’s doing is listing all fields for all streams in that index-set.
Additional info, while our (4) nodes may be processing 5000 - 10000 messages/second at any given time, overall system load isn’t a concern, and the Elasticsearch nodes are all health and under reasonable load. All are bare metal, not VM or container. No disk space issues or hardware issues. No change in GUI behavior while connected to a node handling only 500 messages/second vs one doing 5k/second.
Interesting to hear your experience John. In my case this performance issue has been around for several months, and despite me updating to the latest version, persists.
My best guess at the moment is that the indices are locked as read only for some reason. I have found a command that has been used to unlock them but there’s a syntax error in it and my Linux skills are pretty basic I’m afraid.
The command is something like this:
curl -X PUT “localhost:9000/_all/_settings?pretty” -H ‘Content-Type: application/json’ -d ’{“index.blocks.read_only_allow_delete”: null}’
ubuntu@SystemsLoggingGraylog-Live:~$ sudo systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-12-17 11:56:38 UTC; 3 days ago
Docs: http://www.elastic.co
Main PID: 974 (java)
Tasks: 42 (limit: 4915)
CGroup: /system.slice/elasticsearch.service
└─974 /usr/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.networkaddre
Dec 17 11:56:38 SystemsLoggingGraylog-Live systemd[1]: Started Elasticsearch.
Dec 17 11:56:38 SystemsLoggingGraylog-Live elasticsearch[974]: warning: Falling back to java on path. This behavior is deprecated. Specify JAVA_HOME
Dec 17 11:56:40 SystemsLoggingGraylog-Live elasticsearch[974]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future
[2]+ Stopped sudo systemctl status elasticsearch
I see, Correct me if I’m wrong, You’ve changed Elasticsearch port 9200 to 9000 and you also have a the Web UI on port 9000? I believe you should be using the port shown in your elasticsearch.yml file.
Hi tmacgbay, I’m not looking at the logs but I am still seeing an unresponsive web interface for most of the time. When I can get in I have had a look around and taken a few screen grabs, Perhaps I can PM these to you? I’m not sure how safe it would be to post them here.
I’ve just changed the instance type from t2.large to t2.xlarge, which my Googling has indicated should increase the cores from 2 to 4. Is there anything else I need to do for the instance to take advantage of this change? My knowledge of AWS is scant at best.