Hi @gsmith indeed, before this started happening, my instance was happily churning away ~50 GB a day and still is with the exception of a spike here and there when someone turns on debug mode. I run yum security updates bi-weekly which seems to have upped my Java from 1.8.0.265.b01-1.amzn2.0.1.x86_64 to 1.8.0.272.b10-1.amzn2.0.1.x86_64 a few months ago. But running yum yesterday bumped me from 272 to 1.8.0.282.b08-1.amzn2.0.1.x86_64
Other than the python band-aid, I’ve turned on debug in the application logs, but I’m not really seeing anything jump out at me that indicates a failure in the API.
@aaronsachs Hi Aaron, that’s the real kicker is that resource utilization appears very normal, the CPU doesn’t break above half and the heap/buffers/mem are within range as well. In my python keepalive, I threw in some psutil calls for that very reason:
Mar 24 04:40:09 python[6356]: The Graylog service was restarted at 2021-03-24 04:40:09.539326
Mar 24 04:40:09 python[6356]: Memory usage: svmem(total=16214753280, available=9239326720,
percent=43.0, used=6646013952, free=4742971392, active=6716796928, inactive=4351053824,
buffers=2138112, cached=4823629824, shared=389120, slab=256032768)
Mar 24 04:40:09 python[6356]: CPU usage: scputimes(user=58052.41, nice=0.19,
system=16311.24, idle=550130.79, iowait=946.91, irq=0.0, softirq=187.98, steal=1.62, guest=0.0,
guest_nice=0.0)
Checking overnight, I’ve actually have had no in over 24 hours from the time of this post, so I’m wondering if that Java update did in fact cause this issue. But I’m no where near certain.