Sorry for the dumb questions but:
How does an extractor exactly work, I mean software side. Does it store those extracted fields in elasticsearch working something like a mapping? Or the “extraction patterns” are completely stored Graylog-side and every time we query something from Elastic it goes trough those “extraction pattern” first?
Secondly, does creating extractors have any real impact od Graylog’s performance? Do we need to increase resources if we start to hit big numbers of extractors?
Data moves through extractors and pipelines, depending on config.
Data is permutated based on rules in extractors and pipelines.
All resulting data is stored in ElasticSearch.
And yes, I do believe that extractors can affect performance. Especially if you’re working with regular expressions with wildcards in inconvenient places. But the actual performance hit I can’t tell you much about. I do believe @macko003 knows a lot about this.
Any additional operation need resources. Try it and you will see what happens. We use a lot of extractors and pipelines without problem. But it depends on your usage. If you use a wrong/slow regex, it will slow… (as @Totally_Not_A_Robot told) If you have performance problem you can try to optimalize or disable it.
We handle ~15k/s log with 4 GL servers, The servers do nothing, full with free resources.