In our environment we generally have 200-300 messages per second. We have a single Graylog instance that seems to handle the amount of data we through at it very effectively. We have given the system 14 cores and 16GB of memory and all of our Datastores exist on flash storage.
We are able to effectively run for month and then during our monthly maintenance we install patches and reboot machines as needed. During this time my logs go from 200-300 messages per second to about 5000-7000 messages per second for about 4 hours or less.
I have increased my Journal from 1GB to 25GB still had a crash (Although much later in the day) but our Graylog instance still crashed. I don’t want to build a second or third node just for 4 hours or less of ingestion a month. I want to be able to build a large journal and just let Graylog filter through the data until the journal is empty again to survive the momentary load.
Besides this once a month event our journals pretty much stay empty and I just want to be able to let messages queue until our system can process everything needed.
What can I do to best optimize around this small even once a month?