Backup graylog and recover from failures

I have a graylog cluster which has one master node in the in-house server room and one node in the remote datacenter.

The Master node also hosts elastic search and mongo db.

All was running fine until, few days back over the weekend the disk space of the Master Node got full and everything stopped.

After going through all the articles and help available, the elastic search cluster state was still RED and graylog also kept on giving error:

“Deflector is pointing to [graylog_7], not the newest one:[graylog_8],Re-pointing.”

All said and done. The graylog was restored to one week back full backup. And the logs for one week were lost.

I want the community to help me in getting best practises from the experience you all had in:

  1. How to make sure that I don’t loose any logs in future : Some Backup of Graylog Index etc

  2. How to recover the logs from a failed elastic search cluster due to space full issue.

  3. Log rotation and archiving or some sort of log preservation and retrieval for old logs in future.

Please help me understand what can be done for Graylog to be able to rotate the logs, keep old logs, archive them, recover failed elastic search cluster etc.

The setup was running in a production environment and now my job is stake due to the event that took place.

Any help and suggestion would be highly appreciated.

Thanks and Regards
Ankit Sharma

Well, nobody can take the burden to properly monitor your production-critical systems off of you.

This being said, Graylog Enterprise has a comprehensive story for archiving logs from the active cluster.

I just want to know if graylog along with elastic search and all the data can be backed up and restored.

If yes, then what could be the best possible way without facing the data loss.

Thanks and Regards
Ankit Sharma

Sure, just backup MongoDB and the Elasticsearch indices with the appropriate backup (or rather snapshot) tools: