Out of RAM memory

Hmmm… i did it and after about ~ 6 hours, the RAM is around 64% (20 GB out of 32) sometimes it gets smaller, sometimes more but it does not increase to 30-31 GB as it was before i will observe on saturday and sunday maybe we have found a solution.

1 Like

:thinking:
I think we found a clue :male_detective:

Hello over the weekend the indicators increased the memory is not filled immediately but gradually in the beginning it was 64% now 74% and as you can see from the graph this happens on two nodes where graylog is located (graylog is not on the third node) respectively the problem is not with elastic but in graylog it remains to understand what the problem is. the place where i made the settings and rebooted the nodes is marked in red i also see that on the first node (she is the master in graylog) the processor is loaded

Hello,

What I know is…

Field type refresh interval to 30 seconds will reduce the load on resources needed.
Java tends to use a lot of memory depending on how much logs are being ingested/index etc…
That many shards and what types of searches are being executed will have a impact on memory.

Elasticsearch, Graylog and Mongo on the same node could be fighting over resources. If the amount of logs didn’t exceed over 1000-1500 per second I would rule that out but your receiving over 3000 per second so it make me wonder.

So quick question.

  • all three Nodes have Graylog, ES and MongoDb.
  • es-node-03 is master node?
  • es-node-02/01 are master/data nodes?

From the Graph it looks like only two nodes have memory increasing. The “elastic_data-3” is steady. If this is correct , something weird is happing, perhaps a missed configuration?
Can I ask was es-node-03 Always a master node before this issue?

EDIT: I just noticed, Graylog is not on es-node-03 so its just Elasticsearch and MongoDb?
I’m kind of confused when you stated.

When I stated this…

I was assuming that was right.

I’ve been going over this issue, doing some research on High memory Utilization on Graylog.
By chance what do you see when you execute this on the nodes with high memory usage.

root# free -m

And is it possible to see this output? I’m curious , something does seam to add up?

root# lsblk

Out of curiosity which one have you installed? Oracle Java or OpenJDK.

EDIT : I don’t think you mentioned this but besides the use of memory.
How is everything else working? Any other problems arise?

EDIT2: Over sharding as we talked about this before. Indices 515 with 7170 active shards.
Shards: 3
Replicas: 2

This is just calculated from One Index and not all the other indices you have.
That’s resulting in 2 replica shards per primary shard, giving you a total of 9 total shards per index.
That would be 3 primary shards, + 3 First Replica + 3 Second Replica =9 (4635). A node with 30GB of heap memory should have at most 600 shards.
The size of each shard as shown in this document below.

Shard Size

Shard Count

To insure you not going over shard size you can execute this.

curl -X GET 'http://localhost:9200/_cat/indices?v'

Not sure if its the issue, but it does have a impact on resources, along with all the other things I’ve stated above.

Hello

output:

node1:
image

node2:
image

node3:
image
I also want to say that graylog is on node 1 and 2

All 3 nodes have the same settings (except node 3 there is no graylog_journal because there is no graylog)

`openjdk version "1.8.0_302"
OpenJDK Runtime Environment (build 1.8.0_302-b08)
OpenJDK 64-Bit Server VM (build 25.302-b08, mixed mode`

No. There are no problems only RAM is heavily loaded
And now it is very slowly filling up in 4 days + 20%


image
My shelf life except for 2 or 3 indexes is 14 days

@Uporaba

At this point I really don’t know. You’re setup looks really good.
I assume, node one is the master? Or both of these masters?

Example are they configure like this?
Node-01 is_master = true
Node-02 is_master = false

@ttsandrew @tmacgbay @cawfehman @tfpk By chance you guys have any idea’s on this? Only suggestion I can think of since these server are running well is that either elasticsearch need to be on there own node or more memory because of the size of the data being ingested.

yes it’s true

i noticed that the RAM started to fill up slowly but it’s still growing

over the past 2 days, it has grown from 76% to 77%

@Uporaba

I apologize, I’m running out of suggestion to offer you. The only thing I can think of now would be is to separate your Elasticsearch instance from Graylog/MongoDb.
Example:

This is my conclusion for the size of the environment and amount of logs being ingested.
I realize this was running with less memory but something changed in this environment. It could be the amount of fields being generated for the amount of logs being shipped. I have seen JAVA which Graylog is based off of use a lot. Taking in consideration of any and all configuration made. I really don’t know. I do know of other members here who do equal or greater then amount of logs then you have by separating the Elasticsearch from Graylog/MongoDb.

Maybe try to rethink a better way to configure and/or ingest message. Saved search’s and Widgets will have memory consumption, using any wildcards in search’s will also use memory. Pease keep us updated.

EDIT: out of curiosity what is the output of this command? Just double checking,

sysctl -a | grep -i vm.swappiness

@Uporaba

Just an FYI, If you think this might be a bug you could post it here.

vm.swappiness = 30

It looks like we’ll have to change the architecture. What if I change it and it doesn’t solve the problem? :smiley: When changing the architecture, will there be any delays when sending to elastic? And should I put monga together with graylog or leave it remotely?

Hello @Uporaba:

If I am understanding correctly the environment is performing as expected? Do you see any metrics that are out of line? High memory usage especially in a database environment is by design, why would you want your high cost/high performance memory to sit available if it can be allocated and used to increase system performance?

Looking at our environment I see that memory utilization has been near 77% for several days.

I built our Graylog environment monitoring (graylog-server, elasticsearch, and mongo) using this as a starting point:

If you are concerned because of the lack of face-up insight into performance I recommend you start there. The Graylog API makes most of the information you need readily available and it is easily parsed. I monitor the following via the API:

“/api/system/cluster/stats”,
“/api/system/metrics/org.apache.logging.log4j.core.Appender.trace”,
“/api/system/metrics/org.apache.logging.log4j.core.Appender.debug”,
“/api/system/metrics/org.apache.logging.log4j.core.Appender.info”,
“/api/system/metrics/org.apache.logging.log4j.core.Appender.warn”,
“/api/system/metrics/org.apache.logging.log4j.core.Appender.error”,
“/api/system/metrics/org.apache.logging.log4j.core.Appender.fatal”,
“/api/system/metrics/org.graylog2.journal.entries-uncommitted”,
“/api/system/metrics/org.graylog2.shared.buffers.processors.ProcessBufferProcessor.processTime”

I also monitor jvm heap usage, swap usage, physical memory usage, cpu usage, disk space usage, and number of dropped packets.

I believe that everyone prior in this thread has reviewed your environment and determined it to be healthy. If that is true, and if you are not experiencing symptoms of performance issues, then I think that you do not need to worry.

1 Like

Sounds like there has been some good ideas and troubleshooting done already, so I won’t rehash it. What I am curious about though is (and perhaps I missed it) are these systems VMs? are the resources dedicated? How do the ESXi host or hosts look? Also, memory usage isn’t bad as @ttsandrew pointed out. Have you tried allocating more? As @gsmith pointed out as well, a re-architecture may be needed here. With all those shards, you either need more nodes or more memory. The 20shards/GB RAM isn’t a hard and fast limitation, but with 16GB or heap per node, you are looking at a “recommended” maximum of 960 shards. you have almost 4 times that. Officially, I think you are in “unsupported” territory from an elasticsearch perspective. Adding RAM and bumping up the heap might be a simple test/fix. But I think you’ll really want to think about separating the graylog/mongo from the ES.

If you have the resources, I would recommend standing up 3 new ES nodes with the same configuration as the current 3 nodes (facilitates integration with the current environment) Add one new node into the ES cluster at a time. Wait for ES to rebalance the indices. Add the second (5th) additional ES node in. Wait for ES to rebalance the indices. Add the last new node (6th) into the cluster and let ES rebalance.

At this point, if performance is ok, you can leave the build as such, but I wouldn’t recommend it. I would 1 by 1 decom the ES portion of the original 3 nodes so that only Graylog and Mongo are on those. I think you’ll see a performance increase as well. If you don’t, at this point, I would bump the RAM on these new nodes and modify the heap accordingly.

1 Like

Hello,

So you do have swap enabled.
As for your swap I would look at this link below. This also can be done on version 7.10.

I believe so.

@cawfehman & @ttsandrew I would follow either one on what they suggested but in the future you may want to rethink your setup.

Graylog/MongoDb on same node and put elasticsearch on it own node would be advisable.

I consulted with my colleagues and we also came to this conclusion. on Monday I will do this and watch how the situation will change

2 Likes

That’s great :+1:

Please keep us informed.

1 Like

Hello. I have not yet transferred to a new architecture and in general I am writing this so that the topic does not close, because almost 14 days have passed. But I noticed that after reducing the number of shards + increasing the index update time, RAM growth has slowed down since my last message (± 14 days ago) she got carried away by ~ 20%

@Uporaba

:+1:

Thanks for the update.

Hello. I moved the graylog RAM separately, it seems to have stopped growing, but now I see this picture, the web interface has become very slow to load :expressionless: maybe it’s because of the work of mongodb? For graylog, I allocated 4GB of RAM from 8, and 8 cores.

The page loads in about 15-20 seconds

and then this happens

and I also noticed that graylog consumes 80% of memory even on a separate server