curl -X GET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
Returned
saasdrfw_deflector 0 p STARTED 68554 11.3mb 127.0.0.1 IdentifiGL
saasdrfw_deflector 0 r UNASSIGNED
Then I saw these messages in the Graylog UI:
Error starting index ranges recalculation for saasdrfw_deflector
Could not create a job to start index ranges recalculation for saasdrfw_deflector,
reason: FetchError: There was an error fetching a resource: Bad Request.
Additional information: saasdrfw_deflector is not a Graylog-managed Elasticsearch index.
Deflector exists as an index and is not an alias
The deflector is meant to be an alias but exists as an index.
Multiple failures of infrastructure can lead to this.
Your messages are still indexed but searches and all maintenance tasks will fail or produce incorrect results.
It is strongly recommend that you act as soon as possible.
I deleted the Index, and then deleted the deflector “index”.
curl -X DELETE “localhost:9200/saasdrfw_deflector?pretty”
I created a new index with a different name and Index prefix, same result.
Then I saw this under the indexing errors section:
ElasticsearchException[Elasticsearch exception [type=validation_exception, reason=Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;]]
So I found a command that would increase the shard limit, ran that, and restarted ES.
curl -X PUT “localhost:9200/_cluster/settings?pretty” -H ‘Content-Type: application/json’ -d’
{
“persistent” : {
“cluster.routing.allocation.total_shards_per_node” : 2000
}
}’
The shard error went away. But I still couldn’t get the Index to behave properly.
So I went back to the source. It was very chatty. I turned down some of the noise just in case that was the problem - no luck. Then I tore it all down. Deleted the Stream, Index, and Source. Built it all back up - same problem. Deleted them all again. I then created a new input using the same parameters, but this time I configured a known good device to send syslog to it. Same result. Meaning it doesn’t appear to be a problem with the incoming data.
Then I came across this message in the ES log:
[2024-04-30T15:26:30,232][INFO ][o.e.c.m.MetadataCreateIndexService] [IdentifiGL] [saasdrfw_deflector] creating index, cause [auto(bulk api)], templates [saasdrfw-template], shards [1]/[1]
Searching on that didn’t turn up much. One post here indicated that this is caused if the messages are sent directly to ES rather than GL. But there was no answer to how that could happen or how to fix it. Another was a bug report, and was allegedly resolved in a previous version of GL.
There is plenty of space on the volume where the indexes are stored.
The only other thing I can think is that the change I made from this question: Unable to View Metrics jacked up the Index process. Making that change allowed me to see metrics and also improved overall system performance tremendously.
Unless something else comes up that can fix this, I have two last ideas:
- Put those values back in the config and try the process.
- Upgrade from 5.1.7 to 5.2.5.