Graylog (3.1.0-beta.3-1) is running in kubernetes via helm charts and it has 2 nodes with total 8 cpu cores, 16GB RAM.
Elasticsearch (6.2.4) is running in kubernetes via helm charts and it has 3 master, 2 data & 1 dedicated ingest nodes with total 17GB RAM & 6 cpu cores.
I am processing winlogbeat data to graylog via beats port and I have added pipeline with rules in graylog to process the data and route it to other stream and below is the rule I have created,
rule "winlogbeat_alerts" when to_string($message.beats_type) == "winlogbeat" then let msg = clone_message(); let alertType = "windowsEvent"; set_field("alertType", alertType, "", "", msg); set_field("@timestamp", to_string($message.timestamp), "", "", msg); route_to_stream("Alerts_Input", "", msg); end
From winlogbeat, I am getting roughly 10 msg / second since I have enabled only Error/Warning. After adding the above pipeline rule, I can see an exponential growth in the throughtout. I don’t know why it is so.
It shows some 10000+ msg / second processed via that pipeline rule.
This situation leads to 100% heap size usage within few minutes and Graylog crashes.
2019-08-03T19:40:53.028Z WARN [ProxiedResource] Unable to call http://graylog.southeastasia.cloudapp.azure.com:9000/api/system on node <7bd8596d-a8ab-406e-abca-0715d20b8f70> java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_212] at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_212]
If I just change clone_message to create_message , then everything becomes smooth and the throughput also shows whai is correct around 10 mgs/second.
Is this a bug with clone_message or I am doing anything wrong.
Note: I have tried this with version 3.0 as well but the same issue occured.
Please let me know your thoughts.