Elasticsearch Shards Disk Space Usage

gsmith · October 26, 2017, 1:54am

Hello All,
I have a question on Elasticsearch Shards using up disk space. I’m using an All in one Graylog Server, with an index call “Default index set”.
Configured as: Shards 4, Replicas 0, Index rotation strategy: Index Time, Rotation period: P1D (1d, a day), Index retention strategy: Delete, Max number of indices:90. I did some research on shards as shown below;

From my research on Elasticsearch Shards;
“The disk of a single node may be too slow to serve search requests from a single node alone.
To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. Each shard is a fully-functional and independent “index” that can be hosted on any node in the cluster.”
• It allows you to horizontally split/scale your content volume
• It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput
• You may change the number of replicas dynamically anytime, but you cannot change the number shards after-the-fact.

“Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard. By default, an index has 5 primary shards. You can specify fewer or more primary shards to scale the number of documents that your index can handle. You cannot change the number of primary shards in an index, once the index is created.”

So, if I have this correct, I have an Index called “Default index set” with 4 shards. That means my index “Default index set” is split into 4 sections for easier search indexing?
What confuses me is in the Elasticsearch Documentation, stating that each document is stored on the Primary Shard, but if you have 4 primary shards on one server, does this mean that the one document is stored in 4 shards (i.e. S1, S2, S3, S4) on “Default index set”?
If so, does that mean its duplicated across 4 shards and decreasing free disk space?

I don’t know all about Elasticsearch shards, but have a rough Idea. Could someone enlighten me further please.
If I configure a Index with less Shards, Say 2 instead of 4 will this prevent the amount of disk space usage?
Thank you.

jochen · October 26, 2017, 7:01am

For reference, the text you’ve quoted is from Basic Concepts | Elasticsearch Reference [5.6] | Elastic

No, index sets are sets of indices (thus the name) serving a specific purpose. One index set can contain multiple indices following a specific naming scheme (see index prefix) and having a specific configuration (e. g. a certain number of primary and replica shards).

It may clear things up if you realize that “index set” is a term used by Graylog, not by Elasticsearch.

See http://docs.graylog.org/en/2.3/pages/configuration/index_model.html for more details about index sets in Graylog.

Each index of the “Default index set” would have 4 primary shards (and no replica shards) when using that setting.

No, each document is stored in exactly one primary shard and possibly multiple replica shards (if configured), depending on its ID (or rather a hash over its ID).

No.

Please refer to the following pages for more detailed information about Elasticsearch indices, shards, and how they relate:

jtkarvo · October 26, 2017, 3:03pm

If you don’t have over 160GB of logs a day, you could have less shards (one rule of thumb / typical configuration was to have about 20-40G size shards.) Having fewer shards relaxes RAM requirements. Having 4 shards and 90 indices means you’ll have 360 shards on the node. That means you should have 32 GB of RAM allocated to the Elasticsearch alone, and the same amount of unallocated ram for lucene index. If you can get away with 1 shard per day, you’ll have just 90 shards on the node and it would be enough for Elasticsearch to have 9 GB of RAM in JVM.

The number of disk space needed changes much less than the amount of RAM needed.

gsmith · October 26, 2017, 9:15pm

Thank you, I get it now.
Much appreciated for your time in explaining.

system · November 9, 2017, 9:15pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Iptimizing indexs and shards in Graylog Documentation Campfire	1	2746	August 15, 2022
Shards configuration with a single server Graylog Central (peer support) elastic , architecture	7	2215	February 8, 2023
Elasticsearch number of shards Graylog Central (peer support)	6	6581	June 13, 2017
Retention time strategy Graylog Central (peer support)	5	1329	April 11, 2018
Indices Setting in Graylog2 Graylog Central (peer support)	5	1873	October 26, 2018

Elasticsearch Shards Disk Space Usage

Related topics