Hi, everyone. I’m experiencing some serious performance troubles. Whenever a device I’m monitoring with Graylog (or Graylog itself) is up after a certain period of downtime, Graylog receives a flood of logs that is unable to process.
I’ve experienced this issue two times:
the first time I had to stop all inputs and let Graylog process the queued logs;
the second time (Graylog ran out of disk space and crashed, the problem was solved by expanding the partition) I tried again with the previous approach but it was useless, since no logs are processed anymore. Following some suggestions given by the IRC community I stopped Graylog and deleted the journaling files (there where about 1 million unprocessed logs). This procedure didn’t solve the issue and now Graylog also says “-299,416,322 unprocessed messages are currently in the journal, in 1 segments” (I guess this is due to an integer overflow).
My Graylog installation is currently running on a virtual machine with 2 sockets with 4 cores each and 16 GB of RAM. In both occasions all the cores ran at very low levels and changing Graylog’s configuration file in order to make them able to reach an utilization of 90% and above didn’t help.