Instead of attempting to upgrade my 5.x server I created a new 6.0 server and I’m going through and reworking my pipelines and rules now that I’m a bit more familiar with Graylog. I’m also working on learning the rule wizard and it’s differences.
I have a CSV message where the field number and order vary depending on the field values so I can’t use the CSV extractor. Split also caused some problems because it would skip empty fields. My 5.x pipeline splits out fields individually and adds the appropriately converted field. Each time a branching field is reached, I create a temp field containing the remainder of the CSV line. The next stage in the pipeline then applies the appropriate rule based on the branching field.
I’ve started converting this to a grok pattern and have the first stage working, but now I’m not sure of the best way to go from there. Should I continue the practice of saving the remainder of the line as a temp field? Do I create a grok pattern that builds on the initial grok pattern similar to URIHOST? Or something else entirely?
Should grok pattern extraction rules be configured to use the grok pattern in the when clause? I assume this would result in it being evaluated twice and therefore not a good idea.
I know that I want my expensive operations to run on as few messages as possible, but I’m also not sure about my stream sorting. Should I use the stream rules to group messages by device type and then a pipeline to sort that by message type into separate streams? I’m not sure what the performance difference is between stream rules and pipeline when clauses.
Does Graylog have the ability to show the time and resources taken by each action? That way I could see the effect of each change and determine if it was an improvement or not. I’d like to get a handle on how to best use Graylog while I’m still working on volumes small enough to make mistakes.