Open Threat Exchange (OTX) Log Enrichment Timeout - Gumming up the works

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:

I have noticed that Graylog is very slow in processing messages, often times 24-36 hours behind, and only recently.

2. Describe your environment:

  • OS Information: Ubuntu 22.04

  • Package Version: 5.07

  • Service logs, configurations, and environment variables:

N/A

3. What steps have you already taken to try and solve the problem?

I tried increasing available memory and CPU resources, however, my process queue had over 300,000 unprocessed messages and the processing rate was .34/second.

4. How can the community help?

Posting to provide my solution that I found. I am enriching logs with WHOIS data as well as OTX data.

I noticed that I was getting a lot of HTTP 504 Timeout codes from OTX in my Graylog log.

Has anyone else experienced this with OTX?

I removed OTX lookups from my pipeline rules, and the queue cleared out in a matter of seconds.

Has anyone else experienced this with OTX? I surmise what was happening was the request was going to OTX, and graylog was waiting for any response before moving on to the next message. Since it’s a timeout, that caused a large queue to form. Any ideas on how to improve this situation from a Graylog perspective? On the data adapter, my timeouts are 15000 ms for Connect, 10000 ms for Write and 75000 ms for Read. I may try and drop those significantly, but it seems like everything has been timing out…

I will also check in with OTX on the reason for the timeout messages.

@faen,

The OTX API throttles requests after 1000 per hour for the free service. You can increase that to 10k/hr if you subscribe to the OTX feed. (not a Graylog product).

The backups are most likely a direct result of the service not responding after the calls have been throttled by AT&T. Graylog keeps waiting for a response that never comes. With long timeouts, the queue just keeps getting longer as it waits to timeout on each API call.

1 Like

Thank you! I only saw in their documentation the 10K limit. I am definitely over the 1K. Time to scale back, or subscribe. Thanks again!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.