Clean 6.1.1 install with data node = Elasticsearch nodes disk usage error

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:
I completed a fresh install of Graylog 6.1.1 Enterprise on Ubuntu 22.04 LTS (VM) with MongoDB 7. I completed the pre-flight data nodes setup and logged into Graylog using my admin password. I am greeted with a “Elasticsearch nodes disk usage above low watermark” error in the overview.

2. Describe your environment:

  • OS Information:
    Ubuntu 22.04 LTS (VM)

  • Package Version:
    Enterprise 6.1.1

  • Service logs, configurations, and environment variables:

3. What steps have you already taken to try and solve the problem?

The OpenSearch cluster status is: “datanode-cluster is green. Shards: 13 active, 0 initializing, 0 relocating, 0 unassigned”. Clearing the error results in it coming back a few minutes later. There are no inputs configured yet. Results of the a df command show ~70% disk free space:

I saw the same error when I migrated to data nodes on a copy of my production server.

curl -XGET http://localhost:9200/_cluster/health?pretty=true returns “Empty reply” from server.

4. How can the community help?

Is this a bug? How can I fix it?

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

Hey @julsssark,

The data store for the DataNode is /var/lib/graylog-datanode, you’re certain that there is adequate space here?

Within the DataNode logs, is there a more detailed log?

Hi @Wine_Merchant. The data node is installed in the default location on the same disk that is showing 70% free:

2024-10-29_07-12-16

This is a clean install following the GrayLog install documentation and I have not made any customizations to it.

I am new to DataNodes. Where are the logs stored?

They should be found under /var/log/

As for the curl command, see here.

Thanks for the suggestions and for pointing me to the right way to query the data node API.

In case anyone reads this thread later, the correct URL for the data node health API call is (replace hostname:hostport with the correct values):
http://hostname:hostport/api/datanodes/any/opensearch/_cluster/health?pretty=true

My results are:
{
“cluster_name” : “datanode-cluster”,
“status” : “green”,
“timed_out” : false,
“number_of_nodes” : 1,
“number_of_data_nodes” : 1,
“discovered_master” : true,
“discovered_cluster_manager” : true,
“active_primary_shards” : 15,
“active_shards” : 15,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 0,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 100.0
}

I monitored both the data node logs and the graylog-server logs. I was not able to find anything anomalous-looking in the data node logs. I can see the following error in the graylog-server logs:

2024-10-29T18:38:26.621Z WARN [IndexerClusterCheckerThread] Elasticsearch node [127.0.1.1] triggered [ES_NODE_DISK_WATERMARK_LOW] due to low free disk space

I don’t know much about the data node architecture, but it is odd that 127.0.1.1 (vs. 127.0.0.1) would be the address generating the error.

Given that the data node status, health check and logs look clean, this feels like a bug in graylog server (i.e., there is nothing wrong with the data node and the error is being incorrectly displayed).

@julsssark What does the below return?

curl -XGET -H "Content-Type: application/json" http://hostname:hostport/api/datanodes/any/opensearch/_cluster/settings?pretty

I could not get that curl command to work via SSH but the http portion worked from a browser. The response is below.

{
  "persistent" : {
    "opendistro" : {
      "index_state_management" : {
        "history" : {
          "number_of_replicas" : "0"
        }
      }
    },
    "plugins" : {
      "index_state_management" : {
        "metadata_migration" : {
          "status" : "1"
        },
        "template_migration" : {
          "control" : "-1"
        }
      }
    }
  },
  "transient" : { }
}

Looks like the watermark settings are not set. Perhaps it is a bug in the data nodes installer?

Wine_Merchant I compared the watermark settings between my production VM (20gb disk with OpenSearch) and the clean install VM (20gb disk with data node) and the settings are the same:
“disk”:{“threshold_enabled”:“true”,“watermark”:{“flood_stage”:“95%”,“high”:“90%”,“low”:“85%”,“enable_for_single_data_node”:“false”}

I only have a single data node in both installs. I assume that since enable_for_single_data_node is false, that there should not be any watermark errors displayed by Graylog.

Any other ideas? I will open an issue in Github.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.