Index problem after upgrade to 4

Hi,

After a long pause, I’m meddling with Graylog again. After upgrading from 3.3 to 4.0 (and 4.0.1), I noticed that there’s something on indexing. It looks like Graylog is not able to rotate and create new index. It just grows that latests one from default index set. Otherwise it seem to work.

This is docker composer environment with

  • Mongodb 3.6.21
  • Graylog 4.0.1
  • Elasticsearch 7.10.0 (from 6.8.13)

Interesting entries from Graylog’s log:

ERROR: org.graylog2.periodical.IndexRotationThread - Couldn't point deflector to a new index
Could not create new target index <graylog_153>.
INFO : org.graylog2.indexer.rotation.strategies.AbstractRotationStrategy - Deflector index <Default index set> (index set <graylog_152>) should be rotated, Pointing deflector to new index now!

INFO : org.graylog2.indexer.MongoIndexSet - Creating target index <graylog_153>.
WARN : org.graylog2.indexer.indices.Indices - Couldn't create index graylog_153. Error: Unable to create index graylog_153

Suppressed: org.graylog.shaded.elasticsearch7.org.elasticsearch.client.ResponseException: method [PUT], host [http://es01:9200], URI [/graylog_153?master_timeout=30s&timeout=30s], status line [HTTP/1.1 400 Bad Request]

{"error":{"root_cause":[{"type":"validation_exception","reason":"Validation Failed: 1: this action would add [8] total shards, but this cluster currently has [1572]/[1000] maximum shards open;"}],"type":"validation_exception","reason":"Validation Failed: 1: this action would add [8] total shards, but this cluster currently has [1572]/[1000] maximum shards open;"},"status":400}

On Elasticsearch end, I tried to enable cluster.routing.allocation.enable (like in upgrade guide and Log Retention and Unassigned Shards).
Also, I changed cluster.routing.allocation.total_shards_per_node to 1600 but Graylog still says 1000?

Elasticsearch responses:

_cat/nodes:
ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role  master name
172.18.0.2           44          98   0    0.07    0.09     0.06 cdhilmrstw *      es01

_cat/health?v
epoch      timestamp cluster    status node.total node.data shards  pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1606821484 11:18:04  gl-cluster green           1         1   1572 1572    0    0        0             0                  -                100.0%

_cat/indices?h=health,status,index"
all green, open

If I try update templates, I got request timeout:

http post "gl:9000/api/system/indexer/indices/templates/update x-requested-by:httpie"
http: error: Request timed out (30s).

Elasticsearch logs does not show errors, only this warning

"level": "WARN", "component": "o.e.g.DanglingIndicesState", "cluster.name": "gl-cluster", "node.name": "es01", "message": "gateway.auto_import_dangling_indices is disabled, dangling indices will not be automatically detected or imported and must be managed manually"

Otherwise Elasticsearch log is getting a lot of these:

{"type": "server", "timestamp": "2020-12-01T11:27:52,377Z", "level": "INFO", "component": "o.e.c.m.MetadataIndexTemplateService", "cluster.name": "gl-cluster", "node.name": "es01", "message": "adding template [graylog-internal] for index patterns [graylog_*]", "cluster.uuid": "EKsB0ynnS8GXKtrKkiEUbA", "node.id": "bE7FZWtOTx60sTtJJx-AIQ"

I’m probably missing something, but after several hours searching I’m at the begin.
Please, any advice is welcome :slight_smile:

Br,
Jari

Hello @arkkis, welcome!

It looks like you’re on the right track adjusting max shard per node. Did you make that change in elasticsearch.yml, and did you restart elasticsearch afterwards?

To the root problem, have you evaluated your index settings? Are you rotation hourly or daily, and if so is that the best design for your needs? Do you have multiple indices that would be fine as one? Do you need to have 90 daily indices (as an example)?

If you’re just starting with Graylog again I would say that 1600 is a lot of shards unless you’re really hitting it pretty hard right away or you have unique needs for the solution.

I’m by no means an Elasticsearch expert so I went to the elastic forums and I found a ton of stuff on there that was really helpful for sizing.

2 Likes

Have you looked through this documentation yet regarding Elastic Search and moving from Graylog 3.3 to Graylog 4.0?

I suggest reading though this documentation and seeing if this helps.

Hi!

That actually helps (and I was reading that same thing earlier).
One problem still exists: When I try to update graylog index templates, I got timeout:

http post "gl:9000/api/system/indexer/indices/templates/update x-requested-by:httpie"

http: error: Request timed out (30s).

During that, Elasticsearch load rise but there is no errors in logs.
Any ideas what I’m missing with this? :slight_smile:

Br,
Jari

Hello, thank you!

I got retention policy to keep 100 x 1GIB files. With that many indices with too high sharding was causing problems.
I deleted enough old ones so active shard got under 1000. After that, it created new index right away.
I changed it to use 10 x 10 GIB files and I need to keep eye on active shards from now on.

Actually, I need to create few index sets more, but just have to count sharding more carefully next time.

So, thank you and whole community for your support!

2 Likes

NP glad we helped you on the path to fixing it :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.