Good morning guys, please i found this statement when i checked my notification this morning:
_“Journal utilization is too high and may go over the limit soon. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit.”
Please what does it mean and how do i get it fixed. Please i would appreciate a quick response cos i need to fix this asap.
Are messages still outgoing? If yes it may be due to elasticsearch cluster not being able to keep up. In which case you may need more cpu,disk,memory. This could be in the form of giving the current node/s more resources/IO OR adding 1 or more nodes. If there are no messages outgoing it may be that you elasticsearch storage is full and can’t ingest anymore data. In which case you would need to expand storage for elasticsearch or add another node and let it “rebalance” There can be other reasons as well but these are the most common I have run into.
Thanks Mantil, I appreciate. But by Outgoing, what do u mean? Because I can receive messages from nodes whenever i click on “Show received messages” But each time i show the messages, the notification comes back.
Does your graylog interface show that messages are going out? Usually there is a stat at the top showing a graylog clusters incoming and outgoing messages and also stats per node under “system>Nodes”
Dunno what your log ingestion rate is but your heap size seems pretty low. That is somewhat unrelated but you may want to look into that. Are you using the appliance?
When healthy what is your normal log ingestion rate?
gotcha. The appliance at least in our exp. seems to be great for a proof of concept but not up to the task when put into a production environment where you not only need to ingest messages but dashboard and retain over the longterm. 2k+ messages seemed to fill up the default appliance config pretty quickly. I’d recommend moving up to at some point to a more robust setup. Multiple graylog nodes with accompanying elasticsearch cluster behind it. That being said. Your current problem. have you checked to make sure elasticsearch hasn’t run out of storage?
Looks okay. Another question. Is storage struggling to keep up? What kind of storage do you have behind this? Also, what does cpu/memory load look like? Maybe someone else in the community can chime in. My exp isn’t generally in using the appliance. But my guess is you are hitting some kind of hard or soft limit here whether it be cpu/heap size for graylog or elasticsearch. Another place to look. Are you using any expensive extractors? Regex/Grok patterns? those can also quickly eat into system resources if you don’t keep an eye on them. I could be way off base here but all things to look into. Splitting these workloads out really helps in that regard
Talking about extractors Matt, am using just one to extract only the source ips from the raw logs. Please is there another way other than that, which would enable me to extract all fields at once from the raw logs (source and dest addresses, ports, category outcome, different nodes) so that i can just search for any globally.