I have multiple log file types on a group of hosts, application logs (JSON), system logs, apache logs, etc. Currently we’re just sending the application logs through file beats. It seems like we should be able to add the additional files to filebeat and push into Graylog that way. The problem is that beats supports 1 output so these different log types would hit one Graylog input. It doesn’t appear that we can trigger extractors based on the log type field pushed.
What options are available? Is it possible to map files to their own output/input set?
The only and best option is to use the processing pipelines.
That way you could do different extractions, modifications based on the filename the messages is taken from or any other kind of identificator the message has.
We’re currently using “Streams” (Rules) to put data in different indexes based on retention requirements. We also only have 1 log type per input. Extractors are currently managed in the inputs. Are you saying that we can move all of the extractor logic into pipelines? It would be great if we could simplify.
So the workflow would be Input > All Messages > Sort Processing Pipeline (App, apache, syslog, auth) > log specific stream (set retention Indices) > Log specific Processing Pipeline (parse message for those specific log types)?
Although pipelines can trigger other pipelines via message routing, incoming messages must be processed by an initial set of pipelines connected to one or more streams.
Does the “message routing” take place in a stream?
yes, you can do all (and more) what you do in the extractor in the pipelines. Also you can do stream routing in the pipeline based on the extraction/normalization you have done in the pipeline.
Your Workflow would be nearly the one you wrote down
Input > All Messages > Processing Pipelines
and in the Processing Pipelines you can do what you like and what fits your needs, that cou be for example. Split by log type into seperated streams and have other pipelines on that streams that do other extractions/additional processing. The other option would be, do the processing and only save the final result in a new stream.
The Pipelines can route messages to streams and to those streams can be other pipelines connected. So the routing is not from pipeline to pipeline but by pipeline from stream to stream. The only important note on that. You need to have the stages sorted as the processing on the new pipeline jumps in at the stage it is routed to the stream.
Note that the built-in function route_to_stream causes a message to be routed to a particular stream. After the routing occurs, the pipeline engine will look up and start evaluating any pipelines connected to that stream.
So if you route a message from stream_a to stream_b in stage 1 on stream_a the first stage on stream_b for the message is stage_2 as the pipeline processing does not start from the first stage in this pipeline. This is done to prefend the user created loops.