Elasticsearch data nodes got full on disk space

mariusgeonea · November 2, 2018, 12:40pm

Hello,

on Friday 26th the elasticsearch data nodes were full, and no data could be written or deleted,

i have no idea what happen…

can somebody help me to understand what exactly caused the issue?

here are the logs https://drive.google.com/open?id=19OmxW6Yd_Xin5XGlUWbqdZOAQPXyjtHI

Thanks,
Marius.

jan · November 4, 2018, 6:40pm

a) please search the forum
b) read the docs

Your configured retention did not fit into the available disk space.

mariusgeonea · November 5, 2018, 10:51am

Hi Jan,

thanks for the reply.

but the problem is that i have an index size in total of 4.7 or 4.8 TB, and my total cluster size is 6 TB, with 3 data node servers with each 2TB.
every data node server has an LVM, so the logs of elasticseach and the data they go to that LVM. my log files are around 8.4 GB, but the elastic data should have been deleting previous indices as configured in graylog.

the thing is that i’m running this environment for months and nothing like that ever happen… and i’m a little bit surprised. i have read the forums and nothing similar appears in my searchers…

on an elastic forum people were talking about a similar thing where the data folder of elastichsearch gets full without any reason, and some recommended to upgrade the elastichsearch… the problem is that graylog 2.4 can go for 6th version of elastic.

i’ll keep looking into this, and if i’ll find out what is the cause i’ll let you know.

Thanks,
Marius.

jan · November 5, 2018, 11:38am

@mariusgeonea

you placed more information in your second post than in your first!

It might be helpful to describe what had happened, exactly - like to someone who is not you, not knowing anything about your environment. Than it is more likely that you get some help.

Did you have enabled or disabled the force_merge after index rotation? How is your index retention and rotation strategy? How is your sharding and replica configuration? How is your daily ingest? Did the data volumen got full or did something like the logfile fill the disk?

mariusgeonea · November 6, 2018, 11:57am

Hi Jan,

here are some screen shots with the info that you require

mariusgeonea · November 6, 2018, 11:58am

and my daily ingest is around 350 GB

mariusgeonea · November 6, 2018, 12:00pm

no the log files are around 8 gb not that much,

regarding the space, on the ingest nodes it’s saved on a LVM, dedicated only for data and logs for elasticsearch engine.

the only files which were eating space from that lvm were just the logs and the data for elasticsearch

mariusgeonea · November 6, 2018, 12:05pm

1%20SuperPuTTY%20-%20Elasticsearch

mariusgeonea · November 6, 2018, 12:06pm

the shards are 4 for everyindex and replicas 0

to be honest with you i think this might be a bug in elasticsearch…

otherwise i can’t find any other reason for it…

mariusgeonea · November 6, 2018, 12:08pm

this is my ES version elastic%20version

jan · November 6, 2018, 1:30pm

sorry but I’m not willing to dig into this.

The way the information are presented is nothing that would help someone who invest his spare time to help to find the problem in your environment - all information might be given and present but the way it is presented make it hard to read and combine.

Might be that someone else can help you.

mariusgeonea · November 6, 2018, 1:33pm

Hi Jan,

fair enough.

anyway, i’d like to thank you for taking time to help the community.

Marius.

macko003 · November 20, 2018, 11:40am

Hi

Two things,

Check the elastic config for data dir settings, and the ES API too on all servers.
I find this at your logs.
Xfs mark 5% of disk, and you have only 5% disk left
you also can check lsof, what files used by ES, and/or find to find the modified files from the last few hours.

 using [1] data paths, mounts [[/ (rootfs)]], net usable_space [887.5mb], net total_space [49.9gb], spins? [unknown], types [rootfs]

Maybe the elastic cant write another (not data) file.

Second, (not related)
As far as I know, the ES recommendation for shard size is 20-40GB/shard. I suggest decrease the shards number for small indices and/or change retention policy (eg. 20GB max, 30 pcs -> 40GB, 15 pcs)
You have 6 servers, are you sure you don’t need to use replicas?

mariusgeonea · November 22, 2018, 3:39pm

Hi Macko,

my shards are around 12 gb max.
and the writing of the files were all related to data files, not another file like logs or something else…
naturally related to the isolated space that i have only for logs and data files…

Thanks,
Marius.

jan · November 23, 2018, 8:29am

I want to point to: https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

Just for your reference

system · December 7, 2018, 8:29am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Confused on why I'm not storing as much as I expected Graylog Central (peer support)	3	503	September 5, 2017
Runaway Index and allocation failure Graylog Central (peer support)	6	988	April 24, 2018
Default Index ran out of space Graylog Central (peer support)	5	649	June 17, 2021
My active shards percent almost pull and not working graph interface Graylog Central (peer support) elastic	6	373	January 12, 2023
Disk Full, Help with Cleanup Graylog Central (peer support)	3	1725	October 12, 2021

Elasticsearch data nodes got full on disk space

Related topics