Datanode Opensearch unasssigned shards

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:

I am running a rather large Graylog 6.3.3 cluster with 12 data nodes and 12 server nodes.

Over the weekend, several of the data nodes disks filled to 100% and stopped working.

I have added more space for these nodes and Graylog is back up and running.

But, the OpenSearch cluster datanode-cluster is showing red and displaying that there are 1152 unassigned shards.

“OpenSearch cluster datanode-cluster is red. Shards: 3095 active, 0 initializing, 0 relocating, 1152 unassigned”

Since the data-nodes are configured to use certificates for node communications, how to I execute the curl commands to the OpenSearch data-nodes to delete the unassigned shards?

For example, the curl command below to view the unassigned shards returns “curl: (52) Empty reply from server”

curl -XGET “http://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason

If I need to use the Graylog API to remove the unassigned shards, which one do I use, and what is the syntax?

2. Describe your environment:

  • OS Information: Rocky Linux 9.6

  • Package Version:Graylog 6.3.3+700dd8f

  • Service logs, configurations, and environment variables:Logs are large, which log segments would be helpful?

3. What steps have you already taken to try and solve the problem?

I have attempted several different ways to delete these shards. What worked in Elasticsearch in prior Graylog major versions does not work with the OpenSearch data-nodes.

4. How can the community help?

I need help with the commands/API that will delete these unassigned shards.

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

Hi @r3p3r ,

The easiest approach to communicate directly with the opensearch in datanode are client certificates. Please follow the documentation here: Manage Certificates with Data Node

Thank you Thomas, that helped!

I able to see which indexes are causing the bulk of issues.

The main one is the default graylog index (429 indexes with red status).

Since this is the primary index attached to the Default stream, it is not letting me move the index to another so I can delete the existing graylog stream and recreate it.

Below is a snipped from the “_cat/indices?v" filtered for the graylog index.

red open graylog_605 UOqB9nQnSpmW6p12HNaqFg 4 0
red open graylog_600 5GQj1SBwTVKCxnS_fOSZQA 4 0
red open graylog_707 RZn4E6PoTDKtapWrcBxW9g 3 1
red open graylog_708 m-wVuPdvTIagVG0MTahQog 3 1
red open graylog_703 SaZ01DEQT0q8C_jihuor5Q 3 0
red open graylog_704 A_kNgbn5TVCZDrCOVgmgdw 3 0
red open graylog_701 aSnA7B4qRw6VoqIGoQFooQ 3 0
red open graylog_702 SaS3iXWDQNuiKQw9qGUcSQ 3 0
red open graylog_700 pqJH5RgqS6avGMUBXJf4og 3 0
red open graylog_401 i56-t-HwQ76P3qjuechx-g 1 0
red open graylog_522 Ibw-eNflRmeUMnmR-gSEJg 3 0
red open graylog_761 UZOonh04TcOy8yuVv64_ag 1 0
red open graylog_640 hmDWpuLZThSSanK4_OfxEw 2 0
red open graylog_520 cYnbJDD2S2-VRz5lgp_Ehw 3 0
red open graylog_760 vdLfsMQsScGL-EoNd9fZkQ 1 0
red open graylog_408 av5Sa9RQQCqwbyriaoXAcg 1 0
red open graylog_529 NIOcvYIESlGZwB7ILrq4bQ 3 0
red open graylog_409 KAR_6SHUS8Gidgh_0w01Pg 1 0
red open graylog_527 2Uful5tuQvK52QUG1UI7gg 3 0
red open graylog_648 goKcFCUqSxGAaZ6Z5FkcVA 2 0
red open graylog_406 LzfmzNGTSZGrJrhGR05ZUw 1 0

The indexes I did not mind losing the stored data, I created a new index and moved the corresponding inputs/streams to the newly created one and deleted the offending index.

The others are gl-system-events, .ds-gl-datanode-metrics, and a few others.

Any suggestions?