Well, I did a new update attempt. First on a single node, then on the other 3, too.
This time I had NO performance issues. Everything worked well.
So I’m afraid, I can’t reproduce the issue and therefore I’m not able to deliver more metrics or data.
If I recall correctly, there were no significant changes in the load of the Opensearch nodes during my previous attempt. And I’m pretty sure, I found no problems in the graylog-logs at that time, too.
However, it might be noteworthy, that in the meantime some details changed compared to our last upgrade attempt:
- we added 4 CPUs to each of our four Graylogs, as well as to the Opensearch nodes.
- this time I tried the newest Graylog version 6.1.5 (upgrading from 6.0.7)
- Some OS packages were updated in parallel
- I changed our message processing settings from using stream rules to using pipelines only. And I disabled the “Stream Rule Processor”.
- not sure, if some other config entries changed, too (but unlikely)
Unfortunately I can’t say for sure, which of the changes helped to get rid of the issue …
But I’ll keep an eye on that and will open a new thread in case, I run in to issues again (maybe with next upgrade - hopefully not )