Apologies in advance for such basic questions, I am familiar with log aggregation tools and have used Graylog as a user (as well as other tools) but I’m not clear on where to start in defining howto consume non-standard log files.
I have a log file format that is specific to an application, the logs are text files where the lines conform to one of several formats depending on the content - occasionally they may span multiple lines.
To get started I’m trying to find out how I first consume them but then how I break them up, extract the timestamp etc and depending on the “line type” I want to parse out different fields. I’m familiar with Regex and have several scripts that match different line types, could someone maybe give me some pointers on how to build the “consumer” rules, where to put them and how to test them?
hey there! another noob here, but I think I can help on that! I think the way to go would be creating extractors, kinda like I did with Symantec Endpoint Protection Manager logs:
The extractors are created under System>Inputs. You’ll want to create an input, or maybe modify an existing one, though I’d choose to create a new one. Extractors are specific to each input. For instance I have extractors setup on an input for CISCO ASA’s. Then I have another input setup with extractors for Palo Alto Firewalls.
I recommend reading through GrayLog’s documentation for inputs and extractors. They list the info out in a lot of detail and in my opinion mastering the extractors is where you can really make GrayLog shine.
Thanks Jamie, I’ll start experimenting later today. I have a couple more questions that might save me a lot of digging if anyone is willing to help.
I have the VM running, I’m assuming it’d be easier to copy some logs in to the VM and pull them as a sample to play with (as I believe the extractors run during the import?). Assuming this to be true how easy is it to delete the imported content and run again while testing?
Second, my log lines start with a clear time stamp but sometimes there will be entries that include multiple lines (truly multiple lines, not wrapping) - am I likely to have an issue with this? How does it determine what constitutes a “line” during the import?
For example:
[7/29/17 11:22] this is a single line
[7/29/17 11:22] this is a log line
But it has some carriage
Returns in it
[7/29/17 11:22] Another single line entry
[7/29/17 11:22] And another
I will ultimately want to extractor to consider the multi-line entry as a single message.