Messages Not Going Into Index

This is a similar problem to what I saw in my older question - FortiADC SYSLOG Struggles

The time zone settings all match up and are correct. Everything is in UTC.

Syslog/UDP input. It shows messages coming in.

I created an Index set using mostly the default settings (just changed the rotation to match my other index sets).

Created a stream that pulls from the input (using a gl2_source_input rule, just like my other streams). The pattern match test passes. When the stream is enabled, I see metrics that there are messages coming into the stream.

If I view the stream, this is what I get.
Elasticsearch exception [type=index_not_found_exception, reason=no such index []].

Also, new messages STOP appearing in the Input when the Stream is active. If I stop it, the Input resumes seeing messages.

If I run curl -X GET “localhost:9200/_cat/indices/my-index-*?v=true&s=index&pretty” the new index name does not appear.

If I check the graylog elasticsearch log, I see this message every 10 seconds.

[2024-04-23T13:27:49,409][INFO ][o.e.c.m.MetadataIndexTemplateService] [graylogserver] adding template [indexname-template] for index patterns [indexname_*]

Elasticsearch is overall healthy as far as I can tell.

Searching on the error/info messages hasn’t turned up anything useful so far.

I’ve redone the input, index set, and stream several times just to make sure.

I’m lost at this point.

Hey @jfmeyer00

I seen that a while back here.

Thanks for that one. I did see that. My is_leader is set to true.

We’re using Elastic Search, and the paths all look correct.

Another update. This is in the System/Overview, indexing error section.

[indexname_deflector] ElasticsearchException[Elasticsearch exception [type=index_not_found_exception, reason=no such index [indexname_deflector]]]

Hey,

How did you make that index set? And can you show you index set configuration?

I created it in the Graylog Web UI. Just like my other index sets.

Example of a working index set

Edit: I just noticed the Yellow status. Everything appears to be okay. The “deflector” entry on the top (“broken”) index set is new. I’m presuming that’s the cause of the Yellow status.

Hey,

I seen that also and you have unassigned shard. At first I though it was a replica set but you don’t have replica’s configured.

In order to diagnose the unassigned shards you can execute this.

curl -X GET  http://opensearch.domain.com:9200/_cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state

Have you tried to recalculate the index set? There is a drop down, top right corner in that section where your index set is, either rotate or recalculate indices.

Neither recalculating nor rotating had any positive effect.

This is the relevant bit of the CURL

saasdrfw_deflector 0 p STARTED 68554 11.3mb 127.0.0.1
saasdrfw_deflector 0 r UNASSIGNED

curl -X GET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED

Returned

saasdrfw_deflector 0 p STARTED 68554 11.3mb 127.0.0.1 IdentifiGL
saasdrfw_deflector 0 r UNASSIGNED

Then I saw these messages in the Graylog UI:

Error starting index ranges recalculation for saasdrfw_deflector
Could not create a job to start index ranges recalculation for saasdrfw_deflector,
reason: FetchError: There was an error fetching a resource: Bad Request.
Additional information: saasdrfw_deflector is not a Graylog-managed Elasticsearch index.

Deflector exists as an index and is not an alias

The deflector is meant to be an alias but exists as an index.
Multiple failures of infrastructure can lead to this.
Your messages are still indexed but searches and all maintenance tasks will fail or produce incorrect results.
It is strongly recommend that you act as soon as possible.

I deleted the Index, and then deleted the deflector “index”.

curl -X DELETE “localhost:9200/saasdrfw_deflector?pretty”

I created a new index with a different name and Index prefix, same result.

Then I saw this under the indexing errors section:

ElasticsearchException[Elasticsearch exception [type=validation_exception, reason=Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;]]

So I found a command that would increase the shard limit, ran that, and restarted ES.

curl -X PUT “localhost:9200/_cluster/settings?pretty” -H ‘Content-Type: application/json’ -d’
{
“persistent” : {
“cluster.routing.allocation.total_shards_per_node” : 2000
}
}’

The shard error went away. But I still couldn’t get the Index to behave properly.

So I went back to the source. It was very chatty. I turned down some of the noise just in case that was the problem - no luck. Then I tore it all down. Deleted the Stream, Index, and Source. Built it all back up - same problem. Deleted them all again. I then created a new input using the same parameters, but this time I configured a known good device to send syslog to it. Same result. Meaning it doesn’t appear to be a problem with the incoming data.

Then I came across this message in the ES log:

[2024-04-30T15:26:30,232][INFO ][o.e.c.m.MetadataCreateIndexService] [IdentifiGL] [saasdrfw_deflector] creating index, cause [auto(bulk api)], templates [saasdrfw-template], shards [1]/[1]

Searching on that didn’t turn up much. One post here indicated that this is caused if the messages are sent directly to ES rather than GL. But there was no answer to how that could happen or how to fix it. Another was a bug report, and was allegedly resolved in a previous version of GL.

There is plenty of space on the volume where the indexes are stored.

The only other thing I can think is that the change I made from this question: Unable to View Metrics jacked up the Index process. Making that change allowed me to see metrics and also improved overall system performance tremendously.

Unless something else comes up that can fix this, I have two last ideas:

  1. Put those values back in the config and try the process.
  2. Upgrade from 5.1.7 to 5.2.5.

Hey,

Yeah any old or unused custom index set that can be removed, then apply an upgrade and start again. If this doesn’t take care of your issue can you post you configurations you did, I maybe able to replicate it on my lab.

That was the issue. Too many shards. The new index would have put it over the top.

I’ve both increased the shard limit, and started reconfiguring the indexes to use fewer shards. Once I have the latter done, I’ll put the default shard cap back in place.

The command I tried earlier to increase shards was wrong. This is the correct command.

curl -X PUT localhost:9200/_cluster/settings -H "Content-Type: application/json" -d '{ "persistent": { "cluster.max_shards_per_node": "2000" } }'