HELP! Is Graylog datanode acting normal?

1. Describe your incident:
Disk space discrepancy and “Ghost Files” on Graylog DataNodes.
My 3-node DataNode cluster experienced a disk-full event (1.3TB). Even after manually rotating and deleting indices via the Graylog UI, the disk space reported by df -h did not match du -sh.

Specifically, on Node 1, 2 and 3, df -h shows around 196GB used (16%), but sudo du -h --max-depth=1 / only tallies 131GB. I tried service restart on Node 3, the df and du values tallied up (matched), but Node 1 and 2 continues to show a ~65GB gap.

image

image

2. Describe your environment:

  • OS Information:24.04

  • Package Version:Graylog DataNode (OpenSearch 2.19.3)

  • Storage: 1.3TB EBS Volumes

  • Cluster: 3 Nodes, GREEN status.

Hey @Arishem,

In restarting the data-node service and running those commands again, can you see a difference?

Hello, @Wine_Merchant yes.
example.
df -h shows 196G used
du -h shows 131G used

after sudo systemctl restart graylog-datanode
df -h shows 131G.
du -h shows 131G.

sudo du -sh /var/lib/graylog-datanode/
563G /var/lib/graylog-datanode/

sudo du -B1 -s / 2>/dev/null | awk ‘{print “du total: " $1/1024/1024/1024 " GB”}’
du total: 567.866 GB

df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/root 1.3T 633G 626G 51% /

sudo ls -ld /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/*/ | wc -l
264

sudo du -sh /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/*/ | sort -rh | head -20
16G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/yRTuRVYaTNeG-jmcsE9bwQ/
16G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/NhJsMFMQQPurzw_xtdBB1Q/
14G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/szc9mYzdQReXRJ4mrKl0zw/
14G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/SVivBRIoSIWo3ygChEuCmg/
13G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/QnSyLc5-S1el82rPWmURbg/
13G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/Q-LUCfbCTqmeP5IsjkBOOg/
13G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/H3x5KqsuTsK8XcGjDFzckg/
13G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/7xkhOTwFTNO1LkhqqBx6Mg/
12G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/YgGllgxvQ1CyFVTMGpFSGg/
11G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/w2SnM0WEQziVGgj4RlZdtw/
11G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/v2BhqgGXS1mXeLaEcf-GoQ/
11G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/j3FaK1RrTEeh3pmmB9waqQ/
11G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/1qk8pb4MQGawISdlnN1ESw/
9.6G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/YO27Y-4yTb6hpkEoMxON7Q/
9.5G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/xqcYYP9ZReiP-GLafKzK-g/
9.5G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/3n3UxJgvQDeyWmmFBWMBgg/
9.3G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/7YwSkqLRSniCcFTOR7WhAQ/
8.7G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/BFXaNWxcQC2JMMDxuJpqdA/
8.2G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/b5IRIhmdR9awkrxZMafwpw/
8.2G /var/lib/graylog-datanode/opensearch/data/nodes/0/indices/MSAE5_AZQA2nsrMJxS9Cwg/

Does the below show Graylog hanging on to remnants of any deleted files/directories on the nodes with discrepancies?

sudo lsof +L1 | grep -i graylog*

show nothing in all datanode.

will it be the metadata eating the space?

i have 30 Indices,
each indices Rotation period 6 hours and Max number of indices 12

Related issue in the OpenSearch repo: [BUG] OpenSearch 2.19.3 (possibly other versions) not freeing disk space as expected until service is exited/terminated · Issue #20244 · opensearch-project/OpenSearch · GitHub

ow! @boosty thank you for sharing the link. i will look into it and thank you too! @Wine_Merchant