I’m using Collector-sidecar with Nxlog to pull in the localhost_access_logs for a number of Apache Tomcat servers, and for the most part it’s working fine. I’ve noticed an odd behavior for which I’m not sure how long it’s been happening. These are fairly busy Tomcat servers and the behavior I’ve seen a few times of late is that I’ll see a very high volume flood of incoming messages and find that one of these servers has sent in the entirely of a prior days access logs. I have the option “read since start” selected for Nxlog, but as an example I got one of these floods about an hour ago and it sent in the entirety of Thursdays logs. The content of that log file had already been sent in thoughout the day on Thursday. For the Nxlog file input settings I have “save read position” and “read since start” checked.
Yes, they are rotated daily, with the current log name including the date. So todays log is localhost_access_log.2017-08-28.log. There is no symlink of say “localhost_access_log.log” pointing at the current log, and I’m sure one of my problems is that I’m using a wildcard filename in my Nxlog Input definition, “localhost_access_log.*”. Unfortunately this is a standardized config across more than 1,000 Tomcat servers, so less likely that their logging config would/could be updated anytime soon. Is there a better way to define this file input? If I specified the specific dated filename and enabled “rename check”?
So it only knows how to keep the current position of logs based on the log name. If the log file gets rotated, that is technically seen as a new file/log, and it gets processed just the same at the time of rotation.
Essentially, you either need to fix how you specify wildcards in the nxlog conf, change the file extension or location of log rotations, or change the permissions on rotated files to not belong to the nxlog user or group. I will warn you though that the last option will result in a lot of Access Denied errors, which will probably, in turn, end up getting logged, so then you’d also have to do an exec() drop query for messages of that type. Not really ideal, so I suggest the first two options if you are able to modify it.