Clean 6.1.1 install with data node = Elasticsearch nodes disk usage error

julsssark · October 28, 2024, 5:46pm

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:
I completed a fresh install of Graylog 6.1.1 Enterprise on Ubuntu 22.04 LTS (VM) with MongoDB 7. I completed the pre-flight data nodes setup and logged into Graylog using my admin password. I am greeted with a “Elasticsearch nodes disk usage above low watermark” error in the overview.

2. Describe your environment:

OS Information:
Ubuntu 22.04 LTS (VM)
Package Version:
Enterprise 6.1.1
Service logs, configurations, and environment variables:

3. What steps have you already taken to try and solve the problem?

The OpenSearch cluster status is: “datanode-cluster is green. Shards: 13 active, 0 initializing, 0 relocating, 0 unassigned”. Clearing the error results in it coming back a few minutes later. There are no inputs configured yet. Results of the a df command show ~70% disk free space:

I saw the same error when I migrated to data nodes on a copy of my production server.

curl -XGET http://localhost:9200/_cluster/health?pretty=true returns “Empty reply” from server.

4. How can the community help?

Is this a bug? How can I fix it?

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

Wine_Merchant · October 29, 2024, 9:01am

Hey @julsssark,

The data store for the DataNode is /var/lib/graylog-datanode, you’re certain that there is adequate space here?

Within the DataNode logs, is there a more detailed log?

julsssark · October 29, 2024, 2:04pm

Hi @Wine_Merchant. The data node is installed in the default location on the same disk that is showing 70% free:

2024-10-29_07-12-16

This is a clean install following the GrayLog install documentation and I have not made any customizations to it.

I am new to DataNodes. Where are the logs stored?

Wine_Merchant · October 29, 2024, 3:59pm

They should be found under /var/log/

As for the curl command, see here.

julsssark · October 29, 2024, 6:46pm

Thanks for the suggestions and for pointing me to the right way to query the data node API.

In case anyone reads this thread later, the correct URL for the data node health API call is (replace hostname:hostport with the correct values):
http://hostname:hostport/api/datanodes/any/opensearch/_cluster/health?pretty=true

My results are:
{
“cluster_name” : “datanode-cluster”,
“status” : “green”,
“timed_out” : false,
“number_of_nodes” : 1,
“number_of_data_nodes” : 1,
“discovered_master” : true,
“discovered_cluster_manager” : true,
“active_primary_shards” : 15,
“active_shards” : 15,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 0,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 100.0
}

I monitored both the data node logs and the graylog-server logs. I was not able to find anything anomalous-looking in the data node logs. I can see the following error in the graylog-server logs:

2024-10-29T18:38:26.621Z WARN [IndexerClusterCheckerThread] Elasticsearch node [127.0.1.1] triggered [ES_NODE_DISK_WATERMARK_LOW] due to low free disk space

I don’t know much about the data node architecture, but it is odd that 127.0.1.1 (vs. 127.0.0.1) would be the address generating the error.

Given that the data node status, health check and logs look clean, this feels like a bug in graylog server (i.e., there is nothing wrong with the data node and the error is being incorrectly displayed).

Wine_Merchant · October 30, 2024, 8:50am

@julsssark What does the below return?

curl -XGET -H "Content-Type: application/json" http://hostname:hostport/api/datanodes/any/opensearch/_cluster/settings?pretty

julsssark · October 30, 2024, 2:28pm

I could not get that curl command to work via SSH but the http portion worked from a browser. The response is below.

{
  "persistent" : {
    "opendistro" : {
      "index_state_management" : {
        "history" : {
          "number_of_replicas" : "0"
        }
      }
    },
    "plugins" : {
      "index_state_management" : {
        "metadata_migration" : {
          "status" : "1"
        },
        "template_migration" : {
          "control" : "-1"
        }
      }
    }
  },
  "transient" : { }
}

Looks like the watermark settings are not set. Perhaps it is a bug in the data nodes installer?

julsssark · October 31, 2024, 7:03pm

Wine_Merchant I compared the watermark settings between my production VM (20gb disk with OpenSearch) and the clean install VM (20gb disk with data node) and the settings are the same:
“disk”:{“threshold_enabled”:“true”,“watermark”:{“flood_stage”:“95%”,“high”:“90%”,“low”:“85%”,“enable_for_single_data_node”:“false”}

I only have a single data node in both installs. I assume that since enable_for_single_data_node is false, that there should not be any watermark errors displayed by Graylog.

Any other ideas? I will open an issue in Github.

julsssark · November 3, 2024, 4:18pm

github.com/Graylog2/graylog2-server

Clean 6.1.1 single-node datanode install results in a watermark error even though disk is 32% used

opened 04:17PM - 03 Nov 24 UTC

julsssark

bug

## Expected Behavior No error message notification displayed in Overview ## Cur…rent Behavior After data node is setup, the following error message is displayed in the Overview: Elasticsearch nodes disk usage above low watermark (triggered a day ago) There are Elasticsearch nodes in the cluster running out of disk space, their disk usage is above the low watermark. For this reason Elasticsearch will not allocate new shards to the affected nodes. The affected nodes are: [127.0.1.1] ## Possible Solution I am new to datanodes, but I think the notification is being generated in error (i.e., there is nothing wrong with the data node itself). I compared the watermark configuration to my production server (running Graylog without data node on OpenSearch), and the configurations are the same: "disk":{"threshold_enabled":"true","watermark":{"flood_stage":"95%","high":"90%","low":"85%","enable_for_single_data_node":"false"} Both my production and my test server have the same disk size, OS, etc. Details below. ## Steps to Reproduce (for bugs) 1. Create new LXC VM with Ubuntu 2. Follow Graylog documentation for install. 3. Complete preflight and configure single data node with default settings (i.e. no custom CA, etc.) 4. Wait a few minutes and the watermark error notification will appear in System/Overview ## Context I think the error can be ignored. Below is a DF screenshot of my test environment. Graylog is installed on the partition with 32% used (68% free) space. ![Image](https://github.com/user-attachments/assets/9818173c-8ecf-4352-8c3d-93421525c58c) My test server has 20GB of disk allocated and no inputs configured. This is the same disk size as my homelab production server (OpenSearch with 4 inputs and 30 days of data) and that server runs without error. ## Your Environment * Graylog Version: 6.1.1 enterprise * Java Version: Default * OpenSearch Version: Default * MongoDB Version: 7.0x * Operating System: 22.04 LTS. I am using Proxmox as my hypervisor. * Browser version: Safari 18.1

system · November 17, 2024, 4:19pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch nodes disk usage above low watermark Graylog Central (peer support) data-node	5	289	November 26, 2024
Elasticsearch (opensearch) nodes disk usage above flood stage watermark Graylog Central (peer support) alert	4	181	December 19, 2024
I don't have Elasticsearch installed but I'm getting a watermark error Graylog Central (peer support) elastic , data-node	5	43	January 10, 2025
Graylog and elasticsearch issue Graylog Central (peer support)	2	510	March 12, 2022
Elasticsearch cluster datanode-cluster is red. Shards: 1 unassigned Graylog Central (peer support) docker , data-node	5	280	September 2, 2024

Clean 6.1.1 install with data node = Elasticsearch nodes disk usage error

Related topics