Any best practices how to take in syslog from multiple different vendors that have each a bit different way of sending it. Like FortiGates have the data in key=value pairs which is different from Cisco switches, timestamp are in different format etc…
Our first try of Graylog uses different ports for different vendors, then adds a static attribute which we again match in streams. Each stream matches logs from certain vendor and puts them in their own index. Idea was that each index wouldn’t contain that many different fields, though not sure if it’s a problem with Elasticsearch at all? Other that as the “date” field might be in different formats it would make analyzing the logs harder.
Now that I’ve looked at couple more expensive product, they seem to take in al the log in on standard 514 port and then parse all the dates and timestamp etc to a common format once they recognize what the device is.
Is there a reasonable way in Graylog to do similar? For example doing groups of source IP addresses and map them to different streams? Should I store all the syslogs in one set of indices, or continue creating different indices for different vendors?
Thanks for any ideas!
best advice I can give you is to take a look at all the fields that the various sources are sending, and try to develop a common schema that you can use to normalize the data on ingest. Either via pipeline processing or extractors. The benefit for this is that regardless of the syslog source, the source IP address is always SrcIP and searching/correlating data is easier since a query will be
SrcIP:192.168.1.1 OR SourceIP:192.168.1.1 OR src_ip:192.168.1. OR… etc… etc…
On top of that, remembering what format is what based on the stream is really more work than it needs to be.
I would certainly recommend leveraging different ports for different devices when possible. I think it’s cleaner and easier to troubleshoot. If something happens and is sending a large volume of logs that is causing backlog, you can simply shut the port on the firewall of the graylog server to prevent it from filling up.
As far as the timestamp is concerned, you can certainly do that in a pipeline or via an extractor.
The indices question is more involved, but by default, messages that are ingested are routed to the all messages stream and placed into the default index set. The index set is responsible for alot of settings that are more involved than I can explain quickly, but one important one is the retention strategy. having everything in a single index means you will apply the same retention strategy to everything. that probably won’t work for most people.
hope this all makes sense… good luck
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.