Logs received too late but with the right timestamp

mask · October 26, 2021, 7:38am

Hallo

Description of your problem

Linux and Windows logs sent using filebeat or winlogbeat are being delivered to the Graylog server about 13 hours later. On the other hand, the timing of the logs coming from syslog inputs like the firewall and the ESXi servers are correct.
The delayed logs are being delivered with the right time stamp.

Example:
now it’s 2021/10/26 9:28
The last Linux/Windows log: 2021/10/25 20:24 (has the correct timestamp)
The last syslog log: 2021/10/26 9:28 (has the correct timestamp)

Description of steps you’ve taken to attempt to solve the issue

I checked /etc/graylog/server/server.conf and root_timezone = Europe/Berlin
I checked the time zone for the Graylog VM using the command timedatectl
and the output is correct
Local time: Tue 2021-10-26 09:06:36 CEST
Universal time: Tue 2021-10-26 07:06:36 UTC
RTC time: Tue 2021-10-26 07:06:35
Time zone: Europe/Berlin (CEST, +0200)
System clock synchronized: no
systemd-timesyncd.service active: yes
RTC in local TZ: no

Environmental information

Operating system information

Ubuntu 18.04.4 LTS

Package versions

Graylog v3.2.6
MongoDB v4.0.17
Elasticsearch 6.8.7

Anyone knows what is the cause of this behaviour?
Thank you

gsmith · October 26, 2021, 9:36pm

Hello && Welcome

With those two types of operating system I assume there Date/Time is correct on those remote devices? If this is correct, then I would look at your Time configuration settings. This is under System/Overview.
The timestamp field is from ES but in the message time/date that would be from those remote devices.

Here is an example:
I have have a remote device and in the message date/time as shown below you’ll notice that the time in the message is different from the timestamp field.

The reason for this is the TimeZone is incorrect on that remote device.

Added note, on any Linux server we have NTP service.

By chance if these device/s are not virtual machine I would check your Hardware clock or if they are VM’s I would check the HOST that the virtual machine is mounted on.

Hope that helps

mask · October 28, 2021, 2:02pm

Thank you for your detailed replay.

The Linux/Windows VMs has the correct Date/Time and the correct Timezone. I rechecked, and the time in the massage is the same as the timestamp from Graylog.

tmacgbay · October 28, 2021, 4:28pm

What are your winlogbeat/filebeat configurations for those machines that are being delayed? Is it a constant stream of 13 hour old messages or do they all come in at once in 13 hour intervals? I am thinking is a configuration at the client, and more likely to be a beats configuration that is common between your Linux and Windows machines. Be sure to post code using forum tools (like </>) for readability.

gsmith · October 28, 2021, 9:45pm

Hello,

Adding to @tmacgbay statement. By chance did you check the User’s Time Zone?

mask · November 3, 2021, 8:09am

IT is a constant stream.

For Linux

# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}
fields.source: ${sidecar.nodeName}

filebeat.inputs:
- input_type: log
  paths:
    - /var/log/auth.log
    - /var/log/secure
  type: log
output.logstash:
   hosts: ["x.x.x.x:5044"]
path:
  data: /var/lib/graylog-sidecar/collectors/filebeat/data
  logs: /var/lib/graylog-sidecar/collectors/filebeat/log

For Windows

# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}
fields.source: ${sidecar.nodeName}

output.logstash:
   hosts: ["x.x.x.x:5044"]

path:
  data: C:\Program Files\Graylog\sidecar\data
  logs: C:\Program Files\Graylog\sidecar\logs
  
tags:
 - windows
 
winlogbeat:
  event_logs:
   - name: Security
     event_id: 4624, 4625, 1102, 4697, 4700-4800
   - name: System
     level: critical, error, warning

mask · November 3, 2021, 8:10am

I just checked this option and it is correct.

mask · November 3, 2021, 8:32am

Update

First, sorry for the late replay. I was away from my work desk for the last days.

As we’ve changed the time in Germany at the weekend, I was curious to see how would it effect this behavior.

As we changed from UTC/GMT +2 to UTC/GMT +1 I thought that the difference in time should be 12 hours now. But what I have found is that the difference now is about 21 hours!!!

Now it is 03/11/2021 09:17 AM
Last log is from 02/11/2021 11:51 AM

The logs are being received every couple of seconds right now in a “live” stream.

I didn’t have the problem earlier when I set up the server, and I only noticed it two weeks ago, but I’m not sure when it started.
One of the things that have lately changed in my setup is the extractors. I have written more than 20 extractors for the Linux/Windows beats input alone. All of them are Grok pattern.

But at the same time I have also written 17 extractors (also Grok pattern) for the Syslog UDP input, and the logs there are being delivered on time.

I increased the resources of the VM last week to see if the improved performance will help, but it looks like it didn’t help.

Any tips or ideas will be much appreciated.

gsmith · November 5, 2021, 11:54pm

Hello

To sum it up.

You have no extractors running that is manipulating the data from the input?
All buffers are 0 so there is no delay in processing data?
Checked your remote devices and they have the correct TimeZone and the Date/Time is also correct?
Graylog server Date/Time is correct which matches the remote device, correct?
The user logged on have the same Time/Zone and the date/Time is correct?
Did you double check your NTP service and configuration?
Did you see anything that may pertain to this issue in the log files?

Example:

What I noticed was this.

and this

So you have Linux and Windows going to the same INPUT/ports?
Have you tried to separate them?

Example Linux Servers.

Example Windows Servers

It might not help, but its worth a try. If anything, maybe we can narrow it down.

If you tried and checked our suggestions perhaps something in this post may help.

mask · November 10, 2021, 12:40pm

It is the process buffer! It is always at 100%
I separated the Linux and Windows ports as you suggested and will try to minimize my extractors and see if it helps.

Thank you very much for your time and effort.
Any tips on how to improve my process buffer performance would be appreciated.

tmacgbay · November 10, 2021, 4:18pm

One thing to consider is how many steps each message has to go through as it progresses through your system. GROK and REGEX are powerful tools but if you aren’t careful when you are using them in volume their small amount of extra searching per message will really add up. One of the better ways to clear that up quickly is to force the GROK/regex to the beginning (^) or end ($) of the message - without that a pattern match is shifted throughout the whole message each time. For instance, this GROK command forces the match to fail quicker if the first matched item isn’t TIMESTAMP_ISO8601 because of the ^ in the beginning.

^%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{PROG:sys_name}%{SPACE}%{PROG:software}:%{SPACE}%...

gsmith · November 10, 2021, 11:00pm

There are a couple things you can do.

First, one is what @tmacgbay suggested about correcting you GROK/Regex patterns. That would be the first configuration to fix
Second, check your resources CPU, Memory, and disk i/o. On Linux you could either use TOP or HTOP should be able to find what is taking so much CPU & Memory. For disk i/o I have been using Iotop came in handy a couple times.
Third, you could increase you buffers setting in Graylog Configuration file located here

/etc/graylog/server/servcer.confg

The rule I go by is when I configure these 3 settings is the overall available processors are the number of CPU cores. As you increase your buffer settings you will create more CPU threads just an FYI.
For example, I have 14 CPU cores and my buffers are at 100%. I leave 2 cores for my system and the rest incase I need to increase the buffer settings. This may vary depending on your environment. By default, Output buffer doesn’t require a lot, so start with default CPU and go from there. If you’re configuring custom outputs, your needs will vary, and you’ll need to adjust accordingly. I would still start with default unless you have CPUs to spare. Then increase till buffers are at 0. Remember If your buffers are at 100% chances are your journal may be filed up or have a lot of messages so you may need to wait till you seen the journal deceasing to get an accurate reading before reconfiguring your buffer settings again.

 processbuffer_processors = 7
 outputbuffer_processors = 3
 inputbuffer_processors = 2

Since this post is closed, if you have further problems you may need to start a new post concerning you buffers and/or other problems you having.
Hope that helps

system · November 24, 2021, 11:00pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Graylog and windows log times does not match Graylog Central (peer support)	4	1363	November 18, 2019
6 to 7 minutes delay to receive messages in Graylog Graylog Central (peer support)	12	923	October 13, 2022
Timestamp issue on graylog Graylog Central (peer support) pipeline-rules	6	1850	December 18, 2020
Timezone issues. Logs are shown 1 hour behind Graylog Central (peer support)	8	3495	September 17, 2020
Log time mismatch Graylog Central (peer support)	12	770	September 20, 2022