How to collect CLF logs with proper fields and timestamps?

1. Describe your incident:
I’m trying to figure out how to have logs from my windows server come in formatted properly. The logs are in Common Log Format and GrayLog is collecting every line , including the source, date/time, etc as the message and the date/time mismatch.
E.G.
message

“192.168.4.4”,“03/Mar/2022:23:52:16 -0800”,“GET /client-data/agent-resources/agent-dependencies.xml”,404,117,0.000 SUCCESS 6BE045C7AB70B37B

2. Describe your environment:
Graylog Docker:

  • Package Version:
    Latest

  • Service logs, configurations, and environment variables:
    FileBeat Configuration:

Needed for Graylog

fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}

output.logstash:
hosts: [“REDACTED.local:5044”]
path:
data: C:\Program Files\Graylog\sidecar\cache\filebeat\data
logs: C:\Program Files\Graylog\sidecar\logs
tags:

  • windows
    filebeat.inputs:
  • type: log
    enabled: true
    gnore_older: 72h
    paths:
    • C:\Program Files\DesktopCentral_Server\logs\access_logs\access_other_log_*.log

3. What steps have you already taken to try and solve the problem?
I’ve searched around the graylog database but have not found a clear definitive guide.

4. How can the community help?
I’m looking for either some direction towards a guide that can help me set up my filebeat so that the time/date is being properly captured as well as excluded from the message field.

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

Hello && Welcome

I maybe able to help.

To be honest I’m not sure what going on. I don’t see a mismatch. Could you clarify what your see in greater detail?

I’m assuming your using Beat INPUT on Graylog?

As for FileBeat

# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}

filebeat.inputs:
- input_type: log
  paths:
    - C:\Program Files\DesktopCentral_Server\logs\access_logs\access_other_log_*.log
  type: log
output.logstash:
   hosts: ["8.8.8.8:5044"]
path:
  data: /var/lib/graylog-sidecar/collectors/filebeat/data
  logs: /var/lib/graylog-sidecar/collectors/filebeat/log

Yes, I’m using a beats input with the filebeat config I referenced.

Graylog is getting the log file but it’s not processing the timestamp correctly, I’d like to configure the filebeat so that the collector is using the timestamp within the log file message.
I.E. when I set the collector up today I had a 24k entries dating back to January all coming in with the timestamp of today @ noon
Timestamps of newly created logs are also offset by a few seconds.

Oh I see, So FileBeat went re-read the whole log file again.

When you stated it incorrect, off by 5 minutes, 3 hours, etc…?
Normally when timestamp is incorrect ,let’s say 1-2 hours. Check your Graylog server date/time, the remote device send messages/logs, the user logged in has the correct time zone.

It is preferred that Graylog has some type of time sync (i.e. NTP).

Filebeat has a registry file to track the events it has already sent to outputs.
Are you sure Filebeat has access to it? Or does it get deleted between container runs?
Could you please share the debug logs of filebeat?

EDIT: There was anothe community member had docker recently, They had to set there Docker container to the Server Date/time

I’ll need to take a look at that as it is a docker container, but yeah there are two major issues

  • any events processed are timestamped at the time that filebeat read the file, not the timestamp on the log entry itself.
    I.e. when I apply a new filebeat to a system, it’s grabbing every single log entry (even those very old and time-stamping the entry on graylog as the time the collection ran)
    This is usually off by a second or two from when the line was written and timestamp in the captured message.
  • The log file Itself is defaulting to UTC and off by a few hours for that. I can’t recall if I mounted the timezone/clock data through to the container so I’ll need to check that.
1 Like

Ok, there was definitely an issue with the Time zone config, I updated the graylog service config in my compose to include;

d
volumes:
- “/etc/localtime:/etc/localtime:ro”

BEFORE
graylog@0e4d4a0f8de2:~$ date
Wed Mar 9 04:44:26 UTC 2022

AFTER
graylog@2d8e728e77e3:~$ date
Tue Mar 8 20:45:29 PST 2022

Now if I can just figure out how to scrape the timestamp from the collected log message opposed to the time when the collection occurred.

Here is an example of one of the logs captured.
It was timestamped as 8:20 pm today, even though it was actually logged on the 3rd

2022-03-08 22:20:45.939 +00:00 deskcentral
“192.168.4.4”,“03/Mar/2022:20:52:10 -0800”,“GET /client-data/1/domains/linuxosgroup/resources/thelibrary2.xml”,200,242,0.016 SUCCESS 6BE045C7AB70B37B

Nice catch :+1:

Make sure the remote device time/date is set correct.
here is a couple thing , if you havent done it already

First make sure the GL server Time/Date is correct nav → System/Overview on Web UI
Under Time configuration. Should look something like this. All three be the same

  • User greg.smith:2022-03-08 17:51:58 -06:00
  • Your web browser:2022-03-08 17:51:58 -06:00
  • Graylog server:2022-03-08 17:51:58 -06:00

Check your log/s. make sure there is not a problem.
Insure some type of time management is running on Graylog i.e. NTP is preferred.

EDIT: perhaps this may help

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.