Web interface stops responding intermittently

Hi all, Graylog was working about six months ago but now the web UI is unresponsive for long periods. Every now and then it will let me log in.

Ubuntu 18.04
Graylog 3.3.16
Initially when I ssh’ed onto it, the boot volume was full, so I freed up some space. That didn’t fix the problem so I tried rebooting the server, then updating from 3.3.6 to 3.3.16 (had to for Log4j vulnerability, which was the reason I checked the server initially).

What diagnostic tests can I do to see what is stopping the UI from working? Maybe I need more resources assigned to the process?

Thanks

Are you seeing anything in the graylog logs?

tail -f /var/log/graylog-server/server.log

curious that your CPU is running high - hopefully the logs will show some more detail…

Hi tmacgbay, here’s the output. Looks a bit odd?

**ubuntu@SystemsLoggingGraylog-Live** : **~** $ tail -f /var/log/graylog-server/server.log

at org.graylog2.periodical.IndexRotationThread.doRun(IndexRotationThread.java:73) [graylog.jar:?]

at org.graylog2.plugin.periodical.Periodical.run(Periodical.java:77) [graylog.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]

at java.lang.Thread.run(Thread.java:829) [?:?]

2021-12-17T12:34:39.699Z ERROR [Messages] Bulk indexing failed: no write index is defined for alias [graylog_deflector]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index, retrying (attempt #88)

2021-12-17T12:34:44.097Z ERROR [Messages] Bulk indexing failed: no write index is defined for alias [graylog_deflector]. The write index may be explicitly disabled using is_write_index=false or the alias points to multiple indices without one being designated as a write index, retrying (attempt #88)

2021-12-17T12:34:47.364Z ERROR [IndexRotationThread] Couldn't point deflector to a new index

org.graylog2.indexer.ElasticsearchException: Couldn't remove alias graylog_deflector from indices [graylog_5]

blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

at org.graylog2.indexer.cluster.jest.JestUtils.specificException(JestUtils.java:110) ~[graylog.jar:?]

at org.graylog2.indexer.cluster.jest.JestUtils.execute(JestUtils.java:60) ~[graylog.jar:?]

at org.graylog2.indexer.cluster.jest.JestUtils.execute(JestUtils.java:65) ~[graylog.jar:?]

at org.graylog2.indexer.indices.Indices.removeAliases(Indices.java:661) ~[graylog.jar:?]

at org.graylog2.indexer.MongoIndexSet.cleanupAliases(MongoIndexSet.java:352) ~[graylog.jar:?]

at org.graylog2.periodical.IndexRotationThread.checkAndRepair(IndexRotationThread.java:149) ~[graylog.jar:?]

at org.graylog2.periodical.IndexRotationThread.lambda$doRun$0(IndexRotationThread.java:76) ~[graylog.jar:?]

at java.lang.Iterable.forEach(Iterable.java:75) [?:?]

at org.graylog2.periodical.IndexRotationThread.doRun(IndexRotationThread.java:73) [graylog.jar:?]

at org.graylog2.plugin.periodical.Periodical.run(Periodical.java:77) [graylog.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]

at java.lang.Thread.run(Thread.java:829) [?:?]

2021-12-17T12:34:47.366Z WARN [IndexRotationThread] Deflector is pointing to [gl-events_11], not the newest one: [gl-events_12]. Re-pointing.

2021-12-17T12:34:47.367Z ERROR [IndexRotationThread] Couldn't point deflector to a new index

org.graylog2.indexer.ElasticsearchException: Couldn't switch alias gl-events_deflector from index gl-events_11 to index gl-events_1

This error indicates that you have a problem in Elasticsearch - it usually means you have reached the highwatermark for disk space. Here is a link I found that talks about correcting it. Make sure you have checked and solved disk space issues too!

I used the </> forum tool to format your logs to be a bit more readable. :slight_smile:

1 Like

I checked the disk space and I can’t see any problems there.

ubuntu@SystemsLoggingGraylog-Live:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           796M  832K  796M   1% /run
/dev/xvda1       16G  9.9G  5.7G  64% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/loop0       34M   34M     0 100% /snap/amazon-ssm-agent/3552
/dev/loop2      100M  100M     0 100% /snap/core/11606
/dev/loop1      100M  100M     0 100% /snap/core/11743
/dev/loop3       25M   25M     0 100% /snap/amazon-ssm-agent/4046
/dev/loop4       56M   56M     0 100% /snap/core18/2074
/dev/loop5       56M   56M     0 100% /snap/core18/2128
/dev/xvdf       492G  278G  189G  60% /data
tmpfs           796M     0  796M   0% /run/user/1000

I will read through that post again and see if I can figure out what’s going on.

Thanks

According to this post, I have to run a curl command to set the indices to read write again having freed up disk space as they won’t do it by themselves.

When I run that command though, it times out, because the web interface is not responding. Catch 22!

curl -XPUT -H “Content-Type: application/json” https://grayloghost:9000/_all/_settings -d ‘{“index.blocks.read_only_allow_delete”: null}’

curl: (7) Failed to connect to grayloghost port 9000: Connection timed out

We upgraded from 2.5.2 to 3.3.16 yesterday and are also experiencing poor GUI performance, although in our case I’ve at least isolated it to data which resides in a index set with > 1500 total daily fields vis another index set with < 120. A noticed change was in the number of fields listed given a basic search, previously it was showing only those included in the current stream/search parameters, now it’s showing ALL fields in that index-set. We’ve been working to massage our data modeling so as to reduce field count/usage, but that’s a long term project. As of the upgrade yesterday attempting to search/filter against any stream in the default index set (> 1500 fields) is slow with frequent periods of being unresponsive. Yeah just checked again, the fields selector default is to only list fields from currently selected streams, but what it’s doing is listing all fields for all streams in that index-set.

Additional info, while our (4) nodes may be processing 5000 - 10000 messages/second at any given time, overall system load isn’t a concern, and the Elasticsearch nodes are all health and under reasonable load. All are bare metal, not VM or container. No disk space issues or hardware issues. No change in GUI behavior while connected to a node handling only 500 messages/second vs one doing 5k/second.

Interesting to hear your experience John. In my case this performance issue has been around for several months, and despite me updating to the latest version, persists.

My best guess at the moment is that the indices are locked as read only for some reason. I have found a command that has been used to unlock them but there’s a syntax error in it and my Linux skills are pretty basic I’m afraid.

The command is something like this:

curl -X PUT “localhost:9000/_all/_settings?pretty” -H ‘Content-Type: application/json’ -d ’{“index.blocks.read_only_allow_delete”: null}’

Cheers

Greg

Do you have Elasticsearch installed on the machine called grayloghost? Is the service up and running?

$ sudo systemctl status elasticsearch

Hello

I believe your using the wrong port, perhaps 9200 instead of 9000 might work.

example:

curl -XPUT -H “Content-Type: application/json” https://grayloghost:9200/_all/_settings -d ‘{“index.blocks.read_only_allow_delete”: null}’

Hope that helps

1 Like

Hi gsmith, I don’t think so. We have changed the default port for access to the admin web interface to 9000.

Thanks

@tmacgbay here’s the output of that command:

ubuntu@SystemsLoggingGraylog-Live:~$ sudo systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2021-12-17 11:56:38 UTC; 3 days ago
     Docs: http://www.elastic.co
 Main PID: 974 (java)
    Tasks: 42 (limit: 4915)
   CGroup: /system.slice/elasticsearch.service
           └─974 /usr/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Des.networkaddress.cache.ttl=60 -Des.networkaddre

Dec 17 11:56:38 SystemsLoggingGraylog-Live systemd[1]: Started Elasticsearch.
Dec 17 11:56:38 SystemsLoggingGraylog-Live elasticsearch[974]: warning: Falling back to java on path. This behavior is deprecated. Specify JAVA_HOME
Dec 17 11:56:40 SystemsLoggingGraylog-Live elasticsearch[974]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future 

[2]+  Stopped                 sudo systemctl status elasticsearch

Lets look at the Elastic health what are the results of the following:

curl -X GET "ELASTIC_SERV:9000/_cluster/health?pretty"

and

curl -X GET "ELASTIC_SERV:9000/_cat/indices/*?v&s=index&pretty"

The second one - don’t post all the results, we are only looking to see if all health is green…

(Note I did change the port to the non-standard 9000… :slight_smile: )

Hello,

I see, Correct me if I’m wrong, You’ve changed Elasticsearch port 9200 to 9000 and you also have a the Web UI on port 9000? I believe you should be using the port shown in your elasticsearch.yml file.

1 Like

Hi tmacgbay,

Apologies for the delayed reply, we’re in the middle of migrating our MS65 and Google Workspace tenants.

The result of the first command is:

“cluster_name” : “graylog”,
“status” : “green”,
“timed_out” : false,
“number_of_nodes” : 1,
“number_of_data_nodes” : 1,
“active_primary_shards” : 116,
“active_shards” : 116,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 0,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 100.0

The second command returned all status flags as green.

Thanks

Are you still seeing the same errors in Graylog logs?

Hi tmacgbay, I’m not looking at the logs but I am still seeing an unresponsive web interface for most of the time. When I can get in I have had a look around and taken a few screen grabs, Perhaps I can PM these to you? I’m not sure how safe it would be to post them here.

Thanks

By chance have you tried to increase you CPU cores from 2 to 4 to see if that helps?

Hi gsmith,

I’ve just changed the instance type from t2.large to t2.xlarge, which my Googling has indicated should increase the cores from 2 to 4. Is there anything else I need to do for the instance to take advantage of this change? My knowledge of AWS is scant at best.

Thanks