While tunning the cluster (process/Output puffer are full), I monitor the load of each node, the gorilla pops out from time to time with Error: “Cannot read property ‘org.graylog2.buffers.input.usage’ of undefined.”
Increasing Graylog heap size (form 7G to 12GB, and then to 18GB) seems fixed the issue.
This change with output_batch_size change from default 500 to 12,000 also fixed the Process/Output buffer 100% issue (I tried doubling the resource of ElasticSearch first, no significant improvement. The ElasticSearch data nodes are under utilized).
Current load is low. I will continue observing the load/performance and update if there is any change.