Elasticsearch cluster unhealthy (RED) - Shards unassigned


(Jerry Slagger) #1

I am using the latest Graylog v2.2.3 OVA running in VMWare Player v12.5.0 build-4352439. Everything seem to be work great until I rebooted the Graylog Server. After that, the Elasticsearch cluster went into RED and shows the Shards as unassigned.

I ran the following command and restarted the graylog server:
curl -XPUT ‘:9200/_all/_settings’ -d ‘{“number_of_replicas”: 0}’

This reset the unassigned to 0 but the cluster is still in RED


Elasticsearch cluster unhealthy (RED) (triggered 2 days ago)
The Elasticsearch cluster state is RED which means shards are unassigned. This usually indicates a crashed and corrupt cluster and needs to be investigated. Graylog will write into the local disk journal. Read how to fix this in the Elasticsearch setup documentation.

Elasticsearch cluster
The possible Elasticsearch cluster states and more related information is available in the Graylog documentation.
Elasticsearch cluster is yellow. Shards: 4 active, 0 initializing, 0 relocating, 0 unassigned, What does this mean?


(Jerry Slagger) #2

The cluster shows GREEN now but there are no new messages being displayed on the Syslog stream I have set up. This is the only stream outside the defaults and it was working until the reboot that put the shards into unassigned. Messages from nginx requests for example work fine.

curl -XGET http://<SERVER>:9200/_cluster/health?pretty

{
  "cluster_name" : "graylog",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 4,
  "active_shards" : 4,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

(Jan Doberstein) #3

Did you check the available space inside of the VM?


(Jerry Slagger) #4

Yes, I checked and it appears fine.

df -ah
Filesystem      Size  Used Avail Use% Mounted on
sysfs              0     0     0    - /sys
proc               0     0     0    - /proc
udev            2.0G  4.0K  2.0G   1% /dev
devpts             0     0     0    - /dev/pts
tmpfs           396M  624K  395M   1% /run
/dev/dm-0        15G  4.0G   11G  29% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none               0     0     0    - /sys/fs/fuse/connections
none               0     0     0    - /sys/kernel/debug
none               0     0     0    - /sys/kernel/security
none            5.0M     0  5.0M   0% /run/lock
none            2.0G     0  2.0G   0% /run/shm
none            100M     0  100M   0% /run/user
none               0     0     0    - /sys/fs/pstore
/dev/sda1       236M   74M  150M  34% /boot
systemd            0     0     0    - /sys/fs/cgroup/systemd

(Jochen) #5

Have you tried restarting the virtual machine?


(Jerry Slagger) #6

Yes, reboots did not help.

Ok, I am able to use a syslog test util and send a message to graylog, which displays properly. So the issue appears to be with my devices communicating with graylog. Checking now to see if anything changed with our access rules, etc… on the network.

Should I see UDP 514 and UDP6 514 in netstat on the graylog server? I only see the UDP6 and want to make sure that is not the issue.


(Jochen) #7

No, on a system using a dual-stack, you’ll only see one entry in the output of netstat which covers both, IPv4 and IPv6.


(Jerry Slagger) #8

Ok, I figured it out. There were 2 issues. First the cluster going red and then the local PC that the OVA is running on had issues with the Symantec Firewall. Basically, Symantec decided at some point to start blocking the incoming UDP traffic. Everything appears to be working properly now. Thanks for everyone’s time!


(system) #9

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.