Elasticsearch Shards Disk Space Usage

Hello All,
I have a question on Elasticsearch Shards using up disk space. I’m using an All in one Graylog Server, with an index call “Default index set”.
Configured as: Shards 4, Replicas 0, Index rotation strategy: Index Time, Rotation period: P1D (1d, a day), Index retention strategy: Delete, Max number of indices:90. I did some research on shards as shown below;

From my research on Elasticsearch Shards;
“The disk of a single node may be too slow to serve search requests from a single node alone.
To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. Each shard is a fully-functional and independent “index” that can be hosted on any node in the cluster.”
• It allows you to horizontally split/scale your content volume
• It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput
• You may change the number of replicas dynamically anytime, but you cannot change the number shards after-the-fact.

“Each document is stored in a single primary shard. When you index a document, it is indexed first on the primary shard, then on all replicas of the primary shard. By default, an index has 5 primary shards. You can specify fewer or more primary shards to scale the number of documents that your index can handle. You cannot change the number of primary shards in an index, once the index is created.”

So, if I have this correct, I have an Index called “Default index set” with 4 shards. That means my index “Default index set” is split into 4 sections for easier search indexing?
What confuses me is in the Elasticsearch Documentation, stating that each document is stored on the Primary Shard, but if you have 4 primary shards on one server, does this mean that the one document is stored in 4 shards (i.e. S1, S2, S3, S4) on “Default index set”?
If so, does that mean its duplicated across 4 shards and decreasing free disk space?

I don’t know all about Elasticsearch shards, but have a rough Idea. Could someone enlighten me further please.
If I configure a Index with less Shards, Say 2 instead of 4 will this prevent the amount of disk space usage?
Thank you.

For reference, the text you’ve quoted is from Basic Concepts | Elasticsearch Reference [5.6] | Elastic

No, index sets are sets of indices (thus the name) serving a specific purpose. One index set can contain multiple indices following a specific naming scheme (see index prefix) and having a specific configuration (e. g. a certain number of primary and replica shards).

It may clear things up if you realize that “index set” is a term used by Graylog, not by Elasticsearch.

See http://docs.graylog.org/en/2.3/pages/configuration/index_model.html for more details about index sets in Graylog.

Each index of the “Default index set” would have 4 primary shards (and no replica shards) when using that setting.

No, each document is stored in exactly one primary shard and possibly multiple replica shards (if configured), depending on its ID (or rather a hash over its ID).

No.

Please refer to the following pages for more detailed information about Elasticsearch indices, shards, and how they relate:

If you don’t have over 160GB of logs a day, you could have less shards (one rule of thumb / typical configuration was to have about 20-40G size shards.) Having fewer shards relaxes RAM requirements. Having 4 shards and 90 indices means you’ll have 360 shards on the node. That means you should have 32 GB of RAM allocated to the Elasticsearch alone, and the same amount of unallocated ram for lucene index. If you can get away with 1 shard per day, you’ll have just 90 shards on the node and it would be enough for Elasticsearch to have 9 GB of RAM in JVM.

The number of disk space needed changes much less than the amount of RAM needed.

1 Like

Thank you, I get it now.
Much appreciated for your time in explaining.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.