Hello
Thanks for the added info.
I need to ask some questions about this setup.
- How much data/messages are you ingesting (i.e. seconds or per day)?
- From the screen shot above what are your top services using the RAM?
- What do you have set for Graylog Heap? Depending on what type of OS you have, its location may very.
RPM PACKAGE
/etc/sysconfig/graylog-server
or
DEB PACKAGE
/etc/default/graylog-server
- What do you see in Graylog log file?
- What do you see in elasticsearch log file?
- What do you see in MongoDb log file?
Out of curiosity, can I ask why you have three data paths on the same node?
path.data:
- /var/lib/elasticsearch/data1
- /var/lib/elasticsearch/data2
- /var/lib/elasticsearch/data3
The purpose is data integrity, so if a drive crashes you can recover from it.
Example:
path.data:
- /mnt/elasticsearch/data1
- /mnt/elasticsearch/data2
- /mnt/elasticsearch/data3
If all the data directory’s are on the same drive that’s a lot of I/O plus it serves no real purpose that I see.
The reason I say this is because you have a path.repo: ["/var/backup/graylog"]
set. which is also on the same drive.
I’m assuming you set these up?
- password_secret =
- root_password_sha2 =
To sum it up, the configuration look good but you have a lot of your backups for fault redundancy
on the same drive, this might or it could be part of the issue.
Just a suggestion, maybe look into something like this in the future.
/dev/mapper/centos-root 83G 6.9G 76G 9% /
/dev/sda1 194M 126M 69M 65% /boot
/dev/sdb1 296G 109G 172G 39% /mnt/elasticsearch/data1
/dev/sdc1 296G 109G 172G 39% /mnt/elasticsearch/data2
/dev/sdd1 296G 109G 172G 39% /mnt/elasticsearch/data3
### path.repo: ["/mnt/my_repo"] <--- Elasticsearch config
/dev/sde1 296G 109G 172G 39% /mnt/my_repo
I see you have set Elasticsearch heap to 16 GB which I think is half of the amount of memory on one node?
Depending on how you set Graylog heap from the question above, you may have over committed memory allocation to each node. This would also depend on how much logs are being ingested.
Couple more questions.
-
By chance to you have swap configured?
-
Check on
max_file_descriptors
from your os. -
What is you configurations on the indices used?
Example:
Sorry about all the question, I’m trying to narrow it down on what could be the issue or issues.
EDIT:
I forgot to mention, you can execute this curl command find out what is going on with your heap, etc… perhaps it might show some info that can be used.
curl -XGET "http://localhost:9200/_cat/nodes?v=true"