We are currently feeding IoCs from threat intel feeds to dedicated graylog stream on a daily basis (approx 4 MB of data) and using slookup function in pipeline rules to find a match in real time (P.S: Relative data timeframes: 3days). Right now, running smooth when tested on 4000 message/second throughput.
In future, we are planning to increase more sources to graylog. Is slookup way efficient? or do we need to use Lookup Tables that we are not confident with? Your valuable feedback will help us.
Sample raw message (Threat Intel IoC) : 046217f5bae309bf79fff719e18892570aa092febb0096b9169760ae2bab24c2;Intel::FILE_HASH;(100|43|Gen:Variant.Symmi) https://www.hybrid-analysis.com/feed?raw&hts
I would recommend you have a look at lookup tables. From a performance standpoint, it is substantially faster than the SLOOKUP plugin. The plugin is awesome and has a lot of usecases, I’ll give it that, but it’ll run a query against Elasticsearch every time it is invoked. This is very resource intensive in comparison to the lookup tables. If you would put your data in a CSV file that is loaded by Graylog, the loading time is lower (since it will only need some I/O from your Graylog node to read that CSV instead of having to run an entire search against Elasticsearch, and you can define an appropriate cache, speeding things up even more.
Step 1: Create Data Adapter (make sure that Graylog has read permissions on the file)
Step 2: Create Cache (Select a size and for your data relevant eviction policy)
Step 3: Create Lookup Table (Combine the Adapter and the Cache to a Lookup Table)
Step 4: Profit. (Use it in Extractors, Converters, Decorators or Pipeline Rules)