Logs received too late but with the right timestamp

Hallo

Description of your problem

Linux and Windows logs sent using filebeat or winlogbeat are being delivered to the Graylog server about 13 hours later. On the other hand, the timing of the logs coming from syslog inputs like the firewall and the ESXi servers are correct.
The delayed logs are being delivered with the right time stamp.

Example:
now it’s 2021/10/26 9:28
The last Linux/Windows log: 2021/10/25 20:24 (has the correct timestamp)
The last syslog log: 2021/10/26 9:28 (has the correct timestamp)

Description of steps you’ve taken to attempt to solve the issue

  • I checked /etc/graylog/server/server.conf and root_timezone = Europe/Berlin
  • I checked the time zone for the Graylog VM using the command timedatectl
    and the output is correct
    Local time: Tue 2021-10-26 09:06:36 CEST
    Universal time: Tue 2021-10-26 07:06:36 UTC
    RTC time: Tue 2021-10-26 07:06:35
    Time zone: Europe/Berlin (CEST, +0200)
    System clock synchronized: no
    systemd-timesyncd.service active: yes
    RTC in local TZ: no

Environmental information

Operating system information

  • Ubuntu 18.04.4 LTS

Package versions

  • Graylog v3.2.6
  • MongoDB v4.0.17
  • Elasticsearch 6.8.7

Anyone knows what is the cause of this behaviour?
Thank you

Hello && Welcome

With those two types of operating system I assume there Date/Time is correct on those remote devices? If this is correct, then I would look at your Time configuration settings. This is under System/Overview.
The timestamp field is from ES but in the message time/date that would be from those remote devices.

Here is an example:
I have have a remote device and in the message date/time as shown below you’ll notice that the time in the message is different from the timestamp field.

The reason for this is the TimeZone is incorrect on that remote device.

Added note, on any Linux server we have NTP service.

By chance if these device/s are not virtual machine I would check your Hardware clock or if they are VM’s I would check the HOST that the virtual machine is mounted on.

Hope that helps

Thank you for your detailed replay.

The Linux/Windows VMs has the correct Date/Time and the correct Timezone. I rechecked, and the time in the massage is the same as the timestamp from Graylog.

What are your winlogbeat/filebeat configurations for those machines that are being delayed? Is it a constant stream of 13 hour old messages or do they all come in at once in 13 hour intervals? I am thinking is a configuration at the client, and more likely to be a beats configuration that is common between your Linux and Windows machines. Be sure to post code using forum tools (like </>) for readability.

Hello,

Adding to @tmacgbay statement. By chance did you check the User’s Time Zone?

IT is a constant stream.

For Linux

# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}
fields.source: ${sidecar.nodeName}

filebeat.inputs:
- input_type: log
  paths:
    - /var/log/auth.log
    - /var/log/secure
  type: log
output.logstash:
   hosts: ["x.x.x.x:5044"]
path:
  data: /var/lib/graylog-sidecar/collectors/filebeat/data
  logs: /var/lib/graylog-sidecar/collectors/filebeat/log

For Windows

# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}
fields.source: ${sidecar.nodeName}

output.logstash:
   hosts: ["x.x.x.x:5044"]

path:
  data: C:\Program Files\Graylog\sidecar\data
  logs: C:\Program Files\Graylog\sidecar\logs
  
tags:
 - windows
 
winlogbeat:
  event_logs:
   - name: Security
     event_id: 4624, 4625, 1102, 4697, 4700-4800
   - name: System
     level: critical, error, warning

I just checked this option and it is correct.

Update

First, sorry for the late replay. I was away from my work desk for the last days.

As we’ve changed the time in Germany at the weekend, I was curious to see how would it effect this behavior.

As we changed from UTC/GMT +2 to UTC/GMT +1 I thought that the difference in time should be 12 hours now. But what I have found is that the difference now is about 21 hours!!!

Now it is 03/11/2021 09:17 AM
Last log is from 02/11/2021 11:51 AM

The logs are being received every couple of seconds right now in a “live” stream.

I didn’t have the problem earlier when I set up the server, and I only noticed it two weeks ago, but I’m not sure when it started.
One of the things that have lately changed in my setup is the extractors. I have written more than 20 extractors for the Linux/Windows beats input alone. All of them are Grok pattern.

But at the same time I have also written 17 extractors (also Grok pattern) for the Syslog UDP input, and the logs there are being delivered on time.

I increased the resources of the VM last week to see if the improved performance will help, but it looks like it didn’t help.

Any tips or ideas will be much appreciated.

Hello

To sum it up.

  • You have no extractors running that is manipulating the data from the input?
  • All buffers are 0 so there is no delay in processing data?
  • Checked your remote devices and they have the correct TimeZone and the Date/Time is also correct?
  • Graylog server Date/Time is correct which matches the remote device, correct?
  • The user logged on have the same Time/Zone and the date/Time is correct?
  • Did you double check your NTP service and configuration?
  • Did you see anything that may pertain to this issue in the log files?

Example:

image

What I noticed was this.

and this

So you have Linux and Windows going to the same INPUT/ports?
Have you tried to separate them?

Example Linux Servers.

image

Example Windows Servers

image

It might not help, but its worth a try. If anything, maybe we can narrow it down.

If you tried and checked our suggestions perhaps something in this post may help.

2 Likes

It is the process buffer! It is always at 100%
I separated the Linux and Windows ports as you suggested and will try to minimize my extractors and see if it helps.

Thank you very much for your time and effort.
Any tips on how to improve my process buffer performance would be appreciated.

1 Like

One thing to consider is how many steps each message has to go through as it progresses through your system. GROK and REGEX are powerful tools but if you aren’t careful when you are using them in volume their small amount of extra searching per message will really add up. One of the better ways to clear that up quickly is to force the GROK/regex to the beginning (^) or end ($) of the message - without that a pattern match is shifted throughout the whole message each time. For instance, this GROK command forces the match to fail quicker if the first matched item isn’t TIMESTAMP_ISO8601 because of the ^ in the beginning.

^%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{PROG:sys_name}%{SPACE}%{PROG:software}:%{SPACE}%...

1 Like

There are a couple things you can do.

  • First, one is what @tmacgbay suggested about correcting you GROK/Regex patterns. That would be the first configuration to fix

  • Second, check your resources CPU, Memory, and disk i/o. On Linux you could either use TOP or HTOP should be able to find what is taking so much CPU & Memory. For disk i/o I have been using Iotop came in handy a couple times.

  • Third, you could increase you buffers setting in Graylog Configuration file located here

/etc/graylog/server/servcer.confg

The rule I go by is when I configure these 3 settings is the overall available processors are the number of CPU cores. As you increase your buffer settings you will create more CPU threads just an FYI.
For example, I have 14 CPU cores and my buffers are at 100%. I leave 2 cores for my system and the rest incase I need to increase the buffer settings. This may vary depending on your environment. By default, Output buffer doesn’t require a lot, so start with default CPU and go from there. If you’re configuring custom outputs, your needs will vary, and you’ll need to adjust accordingly. I would still start with default unless you have CPUs to spare. Then increase till buffers are at 0. Remember If your buffers are at 100% chances are your journal may be filed up or have a lot of messages so you may need to wait till you seen the journal deceasing to get an accurate reading before reconfiguring your buffer settings again.

 processbuffer_processors = 7
 outputbuffer_processors = 3
 inputbuffer_processors = 2

Since this post is closed, if you have further problems you may need to start a new post concerning you buffers and/or other problems you having.
Hope that helps

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.