Searching imported logs by log timestamp, not time Graylog received the log

Hi,

Got an issue. I want to search my imported logs in Graylog by the timestamp of the logs themselves, not when Graylog received the log via syslog. How would I be able to do this?

Thanks in advance

You would probably have to parse the time/date fields within the data as it comes in to change the date. If you can see the date/time in the message you should be able to do some basic text searching without having to parse/reimport.

@rmdashrf is correct. If the timestamp is not parsed from the syslog message itself (it should be though), you have to parse the timestamp manually, using our Processing Pipelines.

1 Like

Some example based on @lennartā€™s comment.

I struggled 4 hours just to figure that out. :smiley: Hope this will save some time for others.

E.g.

  1. I stored event timesamp value in event_ts, from apache combined log. I extracted with Grok.
  2. Processing pipeline to parse the values I groked and and store it in timestamp value.

Following is example rules for the example.

rule "parse event timestamp"
when
    true
then
    let new_date = parse_date(to_string($message.event_ts), "dd/MMM/yyyy:hh:mm:ss Z");   
    set_field("timestamp", new_date);    
end

I would like to know my approach is proper? or if there is better way to achieve it.

Cheers,

ye

3 Likes

Ye,

Thanks for posting this. We currently do all of this in logstash/nxlog before the logs go into Graylog, and hadnā€™t really considering doing it the way you described.

I donā€™t know if itā€™s proper or not, but it could reduce our need to use more complicated logstash configurations.

Dustin Tennill
EKU

@dustintennill You are right! We definitely recommend going with the way described by @yett.

@yett, thanks for the example! Looks great. :slight_smile:

@lennart, Thanks for the pointer. I was bit unsure if I am on the right direction. Cheers.

Just to understand the thinking here:

Does this mean that the extractors and converters are not the preferred method of handling raw input data (such as lines from log files)? Are they on the way out?

My timestamp still isnā€™t being set.

Iā€™ve tried the above parser as well as:

rule "parse ACCESS log timestamp"
when
    true
then
    let new_date = flex_parse_date(to_string($message.event_ts));   
    set_field("timestamp", new_date);    
end 

The timestamp is still set to the time the message arrived in graylog, not the timestamp in the log. My graylog instance was recently offline, when it caught back up, all of the messages were the time the system processed the messages.

I have a Input Extractor that creates ā€˜event_tsā€™ in the form ā€˜dd/MMM/yyyy:HH:mm:ssā€™

I think it should be

set_field("timestamp",$new_date);

I am also interested in how this works - I have had quite a lot timestamp problems, and still get occasionally failed indexing, due to wrong timestamp format.

The parser generates an error if I add the $new_date.

Well Iā€™m not sure what changed. I didnā€™t change the code, but it magically started working. Now my Access Logs have the timestamp from the log rather than when they arrived. Now to figure out how to fix the rest of my logs !!

This is something Splunk does automatically, I wish Graylog did this out of the box!

1 Like

Trying to catch up here - Same goal: Apache logs provide message timestamp, but Log Time is Graylog-ingested timestamp.

Out of the box COMMONAPACHELOG grok pattern parses the date field just fine into its own fields by out of the box %{HTTPDATE:timestamp} but timestamp is not set. I thought maybe I require the ^ pipeline rule, but testing pipeline looks like itā€™s not running thru the assigned grok pattern: Should it? Thereā€™s no explicit selector for it nor any indication that it tried or didnā€™tā€¦

My method of testing is:

  • Defined Input for this specifically
  • Set custom grok expression on this input only
  • Set up pipeline with all the above
  • Copy/paste raw string into Simulator and Raw String as codec
    Message comes up ā€œno changesā€ and timestamp shown in simulator results is always current time.

I suppose a better question is: With Graylog 2.3.1+9f2c6ef and standard Apache log files, whatā€™s the expected Log Time timestamp? Ingested or message-provided? And is the above recipe required to get Log Time to be message-provided timestamp? How best to test/confirm?

Thanks

[update]
I have narrowed it down to that I canā€™t access $message.time (the json extractor shows me thereā€™s a field ā€œtimeā€). I get the error message Invalid format: "".

Is there a way to inspect a pipeline rule? Console log? Dump $messages and see what I have access to?
(moved this part of my followup question for more eyeballs)

[/update]

[update]
In my case, the $message object was indeed not getting populated, because the order of message processors in Graylog config had Pipeline rules fire first and -then- JSON extraction second:

[/update]

A post was split to a new topic: Make the Timestamp / timestamp get the correct time