How to diagnose freezing GELF UDP input?

I have a GELF UDP input that has been collecting log messages from multiple Apache servers for about a month now. Every 1-2 days, the logging stops being collected from all apache machines at the same time.

Generally I would restart the instance and it would come back up, but this morning I tried something new: Just stopping and starting the Input itself via the web interface.

I’m not sure how to proceed with determining the cause of the freezing - any advice would be greatly appreciated.

Best,

check your system logs - check the Graylog server.log - check your system metrics - check kernel messages

Thanks Jan - I’ve been monitoring the same over previous failures and haven’t seen anything that sticks out, except one itching thing in the graylog server log file (will attach at the end).

I see nothing consistent with overload in system metrics, but do see the network traffic still trying to come in (but graylog doesn’t appear to listen):

Re: Server log - I do see a lot of messages like the below which do not make sense to me. It looks like Apache might be sending in many bad messages, though I do not know why (EDIT: Note, the time of this message coincides with the last received message within seconds):

2019-07-05T14:42:27.271Z ERROR [DecodingProcessor] Unable to decode raw message RawMessage{id=1bf52571-9f33-11e9-bf84-063bc20c2362, journalOffset=10377850, codec=gelf, payloadSize=2048, timestamp=2019-07-05T14:42:27.271Z, remoteAddress=/[REDACTED]:55800} on input <5d07f63af4ca790892f00bff>.
2019-07-05T14:42:27.271Z ERROR [DecodingProcessor] Error processing message RawMessage{id=1bf52571-9f33-11e9-bf84-063bc20c2362, journalOffset=10377850, codec=gelf, payloadSize=2048, timestamp=2019-07-05T14:42:27.271Z, remoteAddress=/[REDACTED]:55800}
com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'AY43MEUQtBhQyaoFm1QXxcYbTVjnwvjvG231HEcZroRhcCTi9Nk7rQ13rzzPY6I5VEhv1ihlIHFgMwOozY_ndzshQwyqPOLv7k_z6UA7NOE30': was expecting ('true', 'false' or 'null')
 at [Source: AY43MEUQtBhQyaoFm1QXxcYbTVjnwvjvG231HEcZroRhcCTi9Nk7rQ13rzzPY6I5VEhv1ihlIHFgMwOozY_ndzshQwyqPOLv7k_z6UA7NOE30-7BReAokuZUJsexeYdCuKmQcHX4BP5RFNFMCv639dwnQBQ61-USBP7fIerGPT3KEsKQLtyJcJJ-H-E5Ak6nVzp4i6OHu8gY2N5yild9qkJkxc10YzWoRFPWkUf7l3SNi-o3qMS-MCCvRMwsGPkjM-xDqmGSJ2plaLaU0XoLv_F4MXRhDBHeTFINPqKaB1CiF_oudqO3q02_RB0x0gyTR08E55B-V09qa6dmuJ4vucHnXHP961LTB0QuMXLIlz79VkmVgzkhBdcKseD7-1RSionOSraxPkBDgo2mvQBfobBSthxAQQaLwhUqP6DGJIOu7wWI5VeKwFQTrAOykaZOJ7TCPbDTR7xGwXMnV30YE_54XMxVQidkYWA3ZDk3kHpuBoF1DnlQdOGZtkSI71HCOwj65zsFoLnGc41weKr_ABIdKLOdlCYdJ0qADXJTL_qDtC5YqRu2yrH-_UbvxITxTtiHa_nsrzxjzREklcRomcpMr3dPwIh3fjAc6pKPG1XZZ5fZeidbwYfI_i0NGQlA34dG8QgXyxXgSZJVEsorzN8ZUQ1BtgVD04d9dXfHHNuABnX1439MukPmg81cE7ZG0pFQql3mg9lvzUV6Buaa9OFb8AD9jjCjImPN23oJj6I7SnU5_JJgWzAnRsfD8WaYsGETNNA&pt=text&li=rbox-t2v&sig=8c9e58debd91891c42dae4af026c3bb429b4c7a37699&redir=https%3A%2F%2Frtb-usw.mfadsrvr.com%2Fclick%2Fuimj5k5MtC6FkhX_74bsilAdcIGXtf6nig0F0WEX55gWr1z7sNogQdb7FBUNt5ZHj_4o-hVcVN8u9C--GuWlQjABaJh4m7tOhwQxZa9dtF21t9rXUofd9Q914yiSO1Qil8oPPFE36_HcH_NvmPlsb1PpFD0D1I_Z1tbOEO0_t-fYS1y4D6IcEnITji4GQKEapeyW94S-4-i_TNbb4AdIfjm8ITgnwrf0ZOFE49ZsVcqW8dr-0USHMc6oqH3-erWG8s77gWHh6lvrrCzLLnE92KLuTwRBsPUowt7O3LHmM8wa4RpEu5NhSQNoWZIoONdEuI8sFxkacUYWlUwvqR4giw6RtGo_clxzbmJGXDY5Ve2v1rNdSJfoRBFH7gbqa5PN8QAREhTO5AkANMZR2H0_g-uMA30etbLXx7hKRJ8LJlA64xomX5DMQnk7RM9Vt8vehP6LfgPFdg0VaXRhcZ8DZOcVDWCozFA6vNI3UEF4gu5mXcmN7VV993wgS41dPFFcAQIa6eauYc5f9KdXNwVBbVeHEeXmCweeCRsnxNDunMpUG_nPoEFbvLeb0WVBCGX9R2cXToxc6YkZFMi5CSrUpg3doR5OFRmJkMk0qy13ml44WLoG5pKv6qYksdL4VoiHIP-vcwE0admj9kx9JYx2wAveRPo-eZvUrgGVDF0pGE2zIvJoCwz54rPGBiJLYSVhjTFLIcZRg15ckbMks5YbV-LQC0yFSZvqlrxCGJh0Tpf-4s-__a27yV5oXEg2_Wmv-rrwrlev6EpiFTzJwd4HuDqfnsvyUk9r7LO-Uf2CL3Dy3Gd0LBrXBOe11hk6SecRBDDhrsET31HEJ99uLc_KgT0EqjOIrrGK31aUcqGUpaXePBFyHRxwkETTsonUemHnOlDcgnG3l_BVbwc25PQCpA781twGGCK4p60%2F%2F%2F&vi=1562337604297&p=mediaforcebidder-global&r=67&ppb=CIME&cpb=Eg8zNjQtNTc2LVJFTEVBU0UYpuCImwQgnP__________ASoZbGEudGFib29sYXN5bmRpY2F0aW9uLmNvbTIId2F0ZXIyNjQ4gLyE9AhAkaQOSIuwDlDGnRJYowY", "_from_apache": "true" }; line: 1, column: 110]
        at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1702) ~[graylog.jar:?]
        at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:558) ~[graylog.jar:?]
        at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2839) ~[graylog.jar:?]
        at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1903) ~[graylog.jar:?]
        at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:749) ~[graylog.jar:?]
        at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3850) ~[graylog.jar:?]
        at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3799) ~[graylog.jar:?]
        at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2397) ~[graylog.jar:?]
        at org.graylog2.inputs.codecs.GelfCodec.decode(GelfCodec.java:127) ~[graylog.jar:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:150) ~[graylog.jar:?]
        at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:91) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:74) [graylog.jar:?]
        at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [graylog.jar:?]
        at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
        at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]

There are many hundreds like it before this, but this is the last message in the log.

My apache log confirmation for GELF is as follows:

LogFormat "{ \"version\": \"1.1\", \"host\": \"%V\", \"short_message\": \"%r\", \"timestamp\": %{%s}t, \"level\": 6, \"_user_agent\": \"%{User-Agent}i\", \"_source_ip\": \"%{X-Forwarded-For}i\", \"_duration_usec\": %D, \"_duration_sec\": %T, \"_request_size_byte\": %O, \"_http_status_orig\": %s, \"_http_status\": %>s, \"_http_request_path\": \"%U\", \"_http_request\": \"%U%q\", \"_http_method\": \"%m\", \"_http_referer\": \"%{Referer}i\", \"_from_apache\": \"true\" }" graylog_access

The above is mentioned here and stack-overflow:

I’ll keep monitoring of course - if you or anyone else thinks of anything further to check, your input is greatly appreciated.

I suspect it is possible the following issue is related to mine, I will investigate:

By this ticket, it appears that the UDP receive can be trashed by malformed packets (bad validation?)

I will use ss next time this happens to see if the listener is in fact dead or alive. If dead I will see if I can reproduce the other issue’s OP’s solution - more to come.

Update: Indeed, you can see below that initially the listener is present, then is gone after the logging stops functioning some time later:

image

I’ve also noticed that since I increased the number of incoming Apache logs (more servers) the fault happens more frequently (every hour or two so far).

glad that you have a trace now.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.