Graylog searches have become extremely slow. Just doing a 5-minute search of our firewall traffic which would have been max 10 seconds, is now at a point where it’s now taking upwards of a minute to complete.
Currently we are only running a single graylag server, that houses graylag and elastic search. We are getting average around 800 logs a sec with spikes up to 1500.
Systems is currently Ubuntu 20.04, with 32 cores, and 128GB ram. 30GB are currently allocated for elasticsearch heap. Another 30 Gb are allocated to graylog. The rest is dedicated for the system. Average cpu is hovering around 30 percent. Average memory usage is a 50 percent. Disk IO has more fluctuation, but peak was a little greater than 40MiB.
Version:5.0.5+d61a926, codename Noir
JVM:PID 2570, Eclipse Adoptium 17.0.6 on Linux 5.4.0-144-generic
Elasticsearch version: 7.10.2
1680183661 13:41:01 graylog green 1 1 999 999 0 0 0 0 - 100.0%
In attempting to resolve the issue, we have tried, restarting the service, rebooting, and updates. We have also tried modifying the server.conf file, in changing the amount output batch size, and processors.
output_batch_size = 3000
Flush interval (in seconds) for the Elasticsearch output. This is the maximum amount of time between two
batches of messages written to Elasticsearch. It is only effective at all if your minimum number of messages
for this time period is less than output_batch_size * outputbuffer_processors.
output_flush_interval = 1
As stream outputs are loaded only on demand, an output which is failing to initialize will be tried over and
over again. To prevent this, the following configuration options define after how many faults an output will
not be tried again for an also configurable amount of seconds.
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
The number of parallel running processors.
Raise this number if your buffers are filling up.
processbuffer_processors = 8
outputbuffer_processors = 5
Any guidance as to where to go from here would be greatly appreciated.