Elasticsearch data nodes got full on disk space


(Mariusgeonea) #1

Hello,

on Friday 26th the elasticsearch data nodes were full, and no data could be written or deleted,

i have no idea what happen…

can somebody help me to understand what exactly caused the issue?

here are the logs https://drive.google.com/open?id=19OmxW6Yd_Xin5XGlUWbqdZOAQPXyjtHI

Thanks,
Marius.


(Jan Doberstein) #2

a) please search the forum
b) read the docs

Your configured retention did not fit into the available disk space.


(Mariusgeonea) #3

Hi Jan,

thanks for the reply.

but the problem is that i have an index size in total of 4.7 or 4.8 TB, and my total cluster size is 6 TB, with 3 data node servers with each 2TB.
every data node server has an LVM, so the logs of elasticseach and the data they go to that LVM. my log files are around 8.4 GB, but the elastic data should have been deleting previous indices as configured in graylog.

the thing is that i’m running this environment for months and nothing like that ever happen… and i’m a little bit surprised. i have read the forums and nothing similar appears in my searchers…

on an elastic forum people were talking about a similar thing where the data folder of elastichsearch gets full without any reason, and some recommended to upgrade the elastichsearch… the problem is that graylog 2.4 can go for 6th version of elastic.

i’ll keep looking into this, and if i’ll find out what is the cause i’ll let you know.

Thanks,
Marius.


(Jan Doberstein) #4

@mariusgeonea

you placed more information in your second post than in your first!

It might be helpful to describe what had happened, exactly - like to someone who is not you, not knowing anything about your environment. Than it is more likely that you get some help.

Did you have enabled or disabled the force_merge after index rotation? How is your index retention and rotation strategy? How is your sharding and replica configuration? How is your daily ingest? Did the data volumen got full or did something like the logfile fill the disk?


(Mariusgeonea) #5

Hi Jan,

here are some screen shots with the info that you require


(Mariusgeonea) #6

and my daily ingest is around 350 GB


(Mariusgeonea) #7

no the log files are around 8 gb not that much,

regarding the space, on the ingest nodes it’s saved on a LVM, dedicated only for data and logs for elasticsearch engine.

the only files which were eating space from that lvm were just the logs and the data for elasticsearch


(Mariusgeonea) #8

1%20SuperPuTTY%20-%20Elasticsearch


(Mariusgeonea) #9

the shards are 4 for everyindex and replicas 0

to be honest with you i think this might be a bug in elasticsearch…

otherwise i can’t find any other reason for it…


(Mariusgeonea) #10

this is my ES versionelastic%20version


(Jan Doberstein) #11

sorry but I’m not willing to dig into this.

The way the information are presented is nothing that would help someone who invest his spare time to help to find the problem in your environment - all information might be given and present but the way it is presented make it hard to read and combine.

Might be that someone else can help you.


(Mariusgeonea) #12

Hi Jan,

fair enough.

anyway, i’d like to thank you for taking time to help the community.

Marius.


#13

Hi

Two things,

Check the elastic config for data dir settings, and the ES API too on all servers.
I find this at your logs.
Xfs mark 5% of disk, and you have only 5% disk left
you also can check lsof, what files used by ES, and/or find to find the modified files from the last few hours.

 using [1] data paths, mounts [[/ (rootfs)]], net usable_space [887.5mb], net total_space [49.9gb], spins? [unknown], types [rootfs]

Maybe the elastic cant write another (not data) file.

Second, (not related)
As far as I know, the ES recommendation for shard size is 20-40GB/shard. I suggest decrease the shards number for small indices and/or change retention policy (eg. 20GB max, 30 pcs -> 40GB, 15 pcs)
You have 6 servers, are you sure you don’t need to use replicas?


(Mariusgeonea) #14

Hi Macko,

my shards are around 12 gb max.
and the writing of the files were all related to data files, not another file like logs or something else…
naturally related to the isolated space that i have only for logs and data files…

Thanks,
Marius.


(Jan Doberstein) #15

I want to point to: https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

Just for your reference


(system) #16

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.