Questions regarding Extractor

Zerobot · January 24, 2019, 7:07am

Hi!

Sorry for the dumb questions but:
How does an extractor exactly work, I mean software side. Does it store those extracted fields in elasticsearch working something like a mapping? Or the “extraction patterns” are completely stored Graylog-side and every time we query something from Elastic it goes trough those “extraction pattern” first?

Secondly, does creating extractors have any real impact od Graylog’s performance? Do we need to increase resources if we start to hit big numbers of extractors?

Totally_Not_A_Robot · January 24, 2019, 7:26am

I do believe it’s the first option:

Data comes in to Graylog
Data moves through extractors and pipelines, depending on config.
Data is permutated based on rules in extractors and pipelines.
All resulting data is stored in ElasticSearch.

And yes, I do believe that extractors can affect performance. Especially if you’re working with regular expressions with wildcards in inconvenient places. But the actual performance hit I can’t tell you much about. I do believe @macko003 knows a lot about this.

macko003 · January 24, 2019, 8:27am

As I see we need to begin at the start.
Read the graylog doc, and ask.http://docs.graylog.org/en/2.5/pages/extractors.html
Elasticsearch stores the logs, include all fields in the message, so the extracted once also. GL does all operation after the logs arrive (except decorators.)
Don’t mix the fields and mapping.
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/mapping-date-format.html

Any additional operation need resources. Try it and you will see what happens. We use a lot of extractors and pipelines without problem. But it depends on your usage. If you use a wrong/slow regex, it will slow… (as @Totally_Not_A_Robot told) If you have performance problem you can try to optimalize or disable it.
We handle ~15k/s log with 4 GL servers, The servers do nothing, full with free resources.

Zerobot · January 24, 2019, 10:32am

Alright, so basically Extractors will hit Graylog, not Elasticsearch.

Right now we have 20k logs/s on a 3-node GL cluster but not so many extractors in it.

Thanks a lot!

macko003 · January 24, 2019, 10:48am

You need some CPU in GL servers, and some disk in ES servers to do it.
Try it, if you don’t do big mistakes wont be any problem.

system · February 7, 2019, 10:49am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extractor causes low out message performance Graylog Central (peer support) grok-patternspl	6	354	November 28, 2023
Using Extractors without inputs Graylog Central (peer support)	4	309	July 5, 2019
Extractor before adding to Elasticsearch? Graylog Central (peer support)	1	312	December 31, 2020
Extractor kills outgoing traffic Graylog Central (peer support)	12	512	July 6, 2023
Exctractors Check Graylog Central (peer support)	4	1354	September 13, 2018

Questions regarding Extractor

Related topics