Linux and Windows logs sent using filebeat or winlogbeat are being delivered to the Graylog server about 13 hours later. On the other hand, the timing of the logs coming from syslog inputs like the firewall and the ESXi servers are correct.
The delayed logs are being delivered with the right time stamp.
now it’s 2021/10/26 9:28
The last Linux/Windows log: 2021/10/25 20:24 (has the correct timestamp)
The last syslog log: 2021/10/26 9:28 (has the correct timestamp)
Description of steps you’ve taken to attempt to solve the issue
I checked /etc/graylog/server/server.conf and root_timezone = Europe/Berlin
I checked the time zone for the Graylog VM using the command timedatectl
and the output is correct Local time: Tue 2021-10-26 09:06:36 CEST Universal time: Tue 2021-10-26 07:06:36 UTC RTC time: Tue 2021-10-26 07:06:35 Time zone: Europe/Berlin (CEST, +0200) System clock synchronized: no systemd-timesyncd.service active: yes RTC in local TZ: no
Operating system information
Ubuntu 18.04.4 LTS
Anyone knows what is the cause of this behaviour?
With those two types of operating system I assume there Date/Time is correct on those remote devices? If this is correct, then I would look at your Time configuration settings. This is under System/Overview.
The timestamp field is from ES but in the message time/date that would be from those remote devices.
Here is an example:
I have have a remote device and in the message date/time as shown below you’ll notice that the time in the message is different from the timestamp field.
What are your winlogbeat/filebeat configurations for those machines that are being delayed? Is it a constant stream of 13 hour old messages or do they all come in at once in 13 hour intervals? I am thinking is a configuration at the client, and more likely to be a beats configuration that is common between your Linux and Windows machines. Be sure to post code using forum tools (like </>) for readability.
First, sorry for the late replay. I was away from my work desk for the last days.
As we’ve changed the time in Germany at the weekend, I was curious to see how would it effect this behavior.
As we changed from UTC/GMT +2 to UTC/GMT +1 I thought that the difference in time should be 12 hours now. But what I have found is that the difference now is about 21 hours!!!
Now it is 03/11/2021 09:17 AM
Last log is from 02/11/2021 11:51 AM
The logs are being received every couple of seconds right now in a “live” stream.
I didn’t have the problem earlier when I set up the server, and I only noticed it two weeks ago, but I’m not sure when it started.
One of the things that have lately changed in my setup is the extractors. I have written more than 20 extractors for the Linux/Windows beats input alone. All of them are Grok pattern.
But at the same time I have also written 17 extractors (also Grok pattern) for the Syslog UDP input, and the logs there are being delivered on time.
I increased the resources of the VM last week to see if the improved performance will help, but it looks like it didn’t help.
One thing to consider is how many steps each message has to go through as it progresses through your system. GROK and REGEX are powerful tools but if you aren’t careful when you are using them in volume their small amount of extra searching per message will really add up. One of the better ways to clear that up quickly is to force the GROK/regex to the beginning (^) or end ($) of the message - without that a pattern match is shifted throughout the whole message each time. For instance, this GROK command forces the match to fail quicker if the first matched item isn’t TIMESTAMP_ISO8601 because of the ^ in the beginning.
First, one is what @tmacgbay suggested about correcting you GROK/Regex patterns. That would be the first configuration to fix
Second, check your resources CPU, Memory, and disk i/o. On Linux you could either use TOP or HTOP should be able to find what is taking so much CPU & Memory. For disk i/o I have been using Iotop came in handy a couple times.
Third, you could increase you buffers setting in Graylog Configuration file located here
The rule I go by is when I configure these 3 settings is the overall available processors are the number of CPU cores. As you increase your buffer settings you will create more CPU threads just an FYI.
For example, I have 14 CPU cores and my buffers are at 100%. I leave 2 cores for my system and the rest incase I need to increase the buffer settings. This may vary depending on your environment. By default, Output buffer doesn’t require a lot, so start with default CPU and go from there. If you’re configuring custom outputs, your needs will vary, and you’ll need to adjust accordingly. I would still start with default unless you have CPUs to spare. Then increase till buffers are at 0. Remember If your buffers are at 100% chances are your journal may be filed up or have a lot of messages so you may need to wait till you seen the journal deceasing to get an accurate reading before reconfiguring your buffer settings again.