I’m trying to determine the best method (pipeline, extractor, or?) to parse two incoming CSV log files into my GL4.1 installation. I’m not sure I am approaching this the correct way. Ideally, the header row of the CSV’s would be ignored, and the tab delimted fields would be processed in some manner for better analysis. The header fields are: TASK NAME/ OPERATION STATUS/ ERROR LINE NUM / ERROR DESCRIPTION INFORMATION
Description of steps you’ve taken to attempt to solve the issue
I’ve got the sidecar installed on the server in question, and the CSV data is incoming from what I can see. What I’m trying understand is what approach I should take.Grok patterns, pipelines, extractors etc.
Environmental information
Server logs from Windows 2019 server running sidecar and custom application
Log files written to csv
Personally I prefer pipelines over extractors but you could likely do either.
Assuming you are using beats on your Windows server you can use the beats configuration to ignore header lines of the file. Below is one I am using - I changed some words around…
# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}
output.logstash:
hosts:
- ${user.BeatsInput}
ssl:
verification_mode: none
path:
data: C:\Program Files\Graylog\sidecar\cache\winlogbeat\data
logs: C:\Program Files\Graylog\sidecar\logs
tags:
- windows, vet
filebeat:
inputs:
##### find animal things but not when line contains bark or lines that start with #
- type: log
enabled: true
include_lines: ['my_brothers_dog', 'hidden_cat', 'lizard']
exclude_lines: ['bark','^#']
fields:
unique_log_tag: animals
ignore_older: 72h
paths:
- C:\Program Files\MS-VET\*.LOG
#
##### find quiet stuff but not if it contains shhhh or lines that start with #
- type: log
enabled: true
include_lines: ['ASMR_videos']
exclude_lines: ['shhhh','^#']
fields:
unique_log_tag: quiet
ignore_older: 72h
paths:
- C:\Program Files\MSQuiet\*.LOG
Sorry I wasn’t clear… the example I provided is a sidecar configuration for filebeats against a windows server - it allows you to define what is sent to the graylog server before it hits the input, extractors or pipelines.
Thanks for the clarification!
I’ve got this entered into my sidecar config.
Just to understand:
If I want to exclude on the header row, I could enter something like
exclude_lines: [‘header1’, ‘header2’, ‘header3’] ?
Would I need to explicitly state every header (column) entry?
Or would the first line that has ‘header1’ exclude the entirety of that row?
I believe it works regex style. That would mean that:
exclude_lines: ['header1', 'header2', 'header3']
would exclude any lines that have the words header1, header2 or header3 in them. In my case I wanted to exclude commented lines in a iis log file so it would be any lines that start with #
exclude_lines: ['^#']
With any luck the header lines are commented out or have something unique in them you can use regex to find and exclude