Production Gelf Appender Messages not received by Graylog

(David) #1

My application log4j2 with a Gelf appender to log messages to our Graylog server.

The configuration is the same in dev, qa, uat, and production. I can view the messages in Graylog from dev, qa, and uat, but production messages are not being received.

I have tested that there is connectivity between production and graylog (UDP on port 12201) by using echo and nc. When a manual test is completed using echo and nc, then I can see the test messages in Graylog.

Does anyone have any other suggestions to troubleshoot this issue?

(Jochen) #2

Without any details about the actual setup and configuration of each component? No.

(David) #3

A bit more background:

  • Hybris web application
  • using log4j2
  • using your log4j2-gelf appender

What other information would be helpful?

(Jochen) #4

The configuration of Log4j 2 and the GELF appender.
The configuration of the Graylog GELF input.
Details about the network environment.

(David) #5

Log4J2 and Gelf appender configuration:

log4j2.status = warn = logs
log4j2.appenders = console, graylog

log4j2.appender.graylog.server=<graylog inputs domain name removed for privacy>
log4j2.appender.graylog.additionalField_application.type = KeyValuePair
log4j2.appender.graylog.additionalField_environment.type = KeyValuePair

log4j2.rootLogger.level = warn
log4j2.rootLogger.appenderRefs = stdout, graylog
log4j2.rootLogger.appenderRef.graylog.ref = GRAYLOG

Gelf UDP input configuration:

- Global
Title: gelf.udp
Bind address:
Port: 12201
Receive Buffer Size: 1048576
Override source:
Decompressed size limit: 8388608

Network details:

Unfortunately I do not have a lot of details other than the Graylog server and Application servers are in the same data centre. Manual testing using echo and nc have proved that there is no firewall rules blocking traffic between the servers.

(Jochen) #6

How have you been testing this exactly?

(David) #7

I was able to ssh to the production application servers and execute the following:

echo -e '{"version": "1.1","host":"<production application server hostname>","short_message":"Short message","full_message":"Backtrace here\n\nmore stuff","level":1,"_user_id":9001,"_some_info":"foo","_some_env_var":"bar"}\0' | nc -u -w 1 <graylog inputs domain name removed for privacy> 12201

I could see the above messages in Graylog.

(Jochen) #8

To summarize all of this:

  • The log4j2-gelf appender is working (as demonstrated in your dev, qa, and uat environments)
  • The network connection from the machine running the application server to the Graylog GELF UDP input on port 12201/udp is working (as demonstrated by your test with netcat)

So maybe the application server in the production environment is using a different configuration file for Log4j 2.

But ultimately, that’s pretty much how far free support can go.
If you want to buy professional support (with NDA and everything, so you can share sensitive information), please check out

(David) #9

Your summary is correct. I have confirmed that the production configuration is the same.

Thanks for your help. I will look into professional support.

(David) #10

The plot thickens … we ran tcpdump for udp traffic on port 12201 and saw that it was trying to send the log messages to the incorrect IP address.

Doing nslookup on the graylog inputs domain resolves to the correct IP address. Similarly, doing a manual test using echo and nc on the production server, shows the UDP traffic being sent to the correct IP address.

It seems that for some reason when the log messages are sent via the Gelf appender running in the application, then it is resolving to the wrong IP address.

Any idea why this is happening?

(Jochen) #11

Maybe the JVM is using a different resolver than what’s being configured in /etc/resolv.conf (or wherever this information is stored on your machine).

(David) #12

Turns out the operations team migrated the Graylog server to a new cluster, so the JVM has the old DNS entry cached. :frowning:

(Jochen) #13

By default, when a security manager is installed, in order to protect against DNS spoofing attacks, the result of positive host name resolutions are cached forever. When a security manager is not installed, the default behavior is to cache entries for a finite (implementation dependent) period of time. The result of unsuccessful host name resolution is cached for a very short period of time (10 seconds) to improve performance.

See, the paragraph named “InetAddress Caching”.

(system) #14

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.