Hello,
After our upgrade from Elasticsearch 5.x to 6.x and Graylog 2.x to 2.5.1 (I can’t remember at the moment what version of Graylog we came from) we encountered a problem related to indexes becoming read-only before the deflector is moved when a new index is created.
What seems to happen:
Graylog has been using index eku_fw_1009, and needs to create a new index callled eku_fw_1010.
The new index gets created successfully.
Graylog attempts to repoint the deflector from eku_fw_1009 to eku_fw_1010, and get an Elasticsearch exception. The index is apparently in a read-only state at this time – not sure if Graylog did this or Elasticsearch.
2019-01-18T11:14:47.150-05:00 WARN [IndexRotationThread] Deflector is pointing to [eku_fw_1009], not the newest one: [eku_fw_1010]. Re-pointing.
org.graylog2.indexer.ElasticsearchException: Couldn't switch alias eku_fw_deflector from index eku_fw_1009 to index eku_fw_1010
After this, we see messages showing the index in question is currently set to read-only state and that Graylog can’t index messages. Thousands of these errors show as it attempts to index messages.
2019-01-18T11:14:55.120-05:00 WARN [Messages] Failed to index message: index=<eku_fw_1009> id=<313dc441-1b3c-11e9-bdea-d89ef3264d11> error=<{“type”:“cluster_block_exception”,“reason”:“blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];”}>
At this point the new index does exist, but the deflector can’t be repointed.
If we fix the read-only state of the index in question, Graylog can successfully move the pointer and starts writing log messages again.
Fixing the index:
curl -X PUT “localhost:9200/eku_fw_1009/_settings” -H ‘Content-Type: application/json’ -d ‘{ “index”: { “blocks”: { “read_only”: “false” } } }’
After applying the fix, I see another message in the log showing it is going to re-point.
2019-01-18T11:14:57.150-05:00 WARN [IndexRotationThread] Deflector is pointing to [eku_fw_1009], not the newest one: [eku_fw_1010]. Re-pointing.
The elasticsearch cluster is not out of disk space, and I didn’t see any error messages related to either of these indexes (eku_fw_1009 or eku_fw_1010). This occurs only on index sets we have created in addition to the “Default index set” graylog uses. I couldn’t find posts related to others having this kind of issue, so my belief is the problem is “our problem” in some way.
While this is occurring, the web interface doesn’t indicate any issues with indexing even though logs are not being successfully written.
Anyone else experiencing this kind of thing?
This github issue seems related, but lists the reverse of my issue.
Thanks!!
Dustin @ EKU