I do believe it’s the first option:
- Data comes in to Graylog
- Data moves through extractors and pipelines, depending on config.
- Data is permutated based on rules in extractors and pipelines.
- All resulting data is stored in ElasticSearch.
And yes, I do believe that extractors can affect performance. Especially if you’re working with regular expressions with wildcards in inconvenient places. But the actual performance hit I can’t tell you much about. I do believe @macko003 knows a lot about this.