Questions regarding Extractor


(Zero) #1

Hi!

Sorry for the dumb questions but:
How does an extractor exactly work, I mean software side. Does it store those extracted fields in elasticsearch working something like a mapping? Or the “extraction patterns” are completely stored Graylog-side and every time we query something from Elastic it goes trough those “extraction pattern” first?

Secondly, does creating extractors have any real impact od Graylog’s performance? Do we need to increase resources if we start to hit big numbers of extractors?


(Tess) #2

I do believe it’s the first option:

  1. Data comes in to Graylog
  2. Data moves through extractors and pipelines, depending on config.
  3. Data is permutated based on rules in extractors and pipelines.
  4. All resulting data is stored in ElasticSearch.

And yes, I do believe that extractors can affect performance. Especially if you’re working with regular expressions with wildcards in inconvenient places. But the actual performance hit I can’t tell you much about. I do believe @macko003 knows a lot about this.


#3

As I see we need to begin at the start.
Read the graylog doc, and ask.http://docs.graylog.org/en/2.5/pages/extractors.html
Elasticsearch stores the logs, include all fields in the message, so the extracted once also. GL does all operation after the logs arrive (except decorators.)
Don’t mix the fields and mapping.
https://www.elastic.co/guide/en/elasticsearch/reference/2.3/mapping-date-format.html

Any additional operation need resources. Try it and you will see what happens. We use a lot of extractors and pipelines without problem. But it depends on your usage. If you use a wrong/slow regex, it will slow… (as @Totally_Not_A_Robot told) If you have performance problem you can try to optimalize or disable it.
We handle ~15k/s log with 4 GL servers, The servers do nothing, full with free resources.


(Zero) #4

Alright, so basically Extractors will hit Graylog, not Elasticsearch.

Right now we have 20k logs/s on a 3-node GL cluster but not so many extractors in it.

Thanks a lot!


#5

You need some CPU in GL servers, and some disk in ES servers to do it.
Try it, if you don’t do big mistakes wont be any problem.


(system) closed #6

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.