Prior to 2.2.3, we had extractors in place for our Fortigate input. These were based on the ones posted in the Graylog marketplace and had been running with minimal configuration since (numerous patches). After the upgrade, I was excited to see any improvements in reliability or maybe new fields we hadn’t been extracting before. We removed the old extractors so as to not interfere or duplicate any extractions.
A few things we noticed:
Certain fields would create index errors due to the expectation of a number but receiving a string from the Fortigate. An example of this is the [Version] field, it expects a number. Periodically, the Fortigate shoves a string in there, for example: “7.6.0&Language=en&Platform=Windows_7&Edition=F&Beta=0&Type=Splash&Number=44”
After running for a weekend, the search interface was laggy and slow. This problem is caused by numerous random fields being create during the extraction. It seems that maybe anything with a = is assumed to be field and value pair and logged that way. This means random data is turned into fields. The slow down on the interface is made by Graylog attempting to complete your input with thousands of possible fields. I don’t know our current number, but if I click the list all fields button, the interface dies. If i search for all fields with a t, the time to return is over 10 minutes. And if I search for time, I get 119 values containing time. Below is an example of some fields:
All Fortigate input is 4 hours behind for us. I know this is already reported to be a timezone issue, but I wanted to mention it as a problem we are also experiencing and hopefully this will be addressed in the next update. This isn’t hard to deal with because at least the information is still available.
This leads me to my questions. How are we supposed to deal with the automatic extractors? Are there any plans allow us to disable the automatic extractors in the future? Which extractors would be processed first, automatic extractors or custom extractors (were we to have any)? Could I use a custom extractor to overwrite an automatic extractor? The current extractors seem to be too permissive and because of that, they chunk the fields in odd ways. When we find a fix for this, what is the recommended way to remove all the junk fields? Should we make a new index, or is restarting our cluster the only method?
I like the idea of built in extractors as they provide a great starting point for anyone new to Graylog and the Fortigate input. But there seems to be a few kinks right now and I am hoping we can make improvements.