Optimizing Graylog for AI Computer-Driven Systems

oliviabarn · December 5, 2024, 2:52pm

Hello Graylog Community,

I’m currently utilizing Graylog to manage logs from an AI computer system that processes real-time sensor data for predictive analysis. As our AI model generates more data, the volume of logs is growing exponentially, and I’m looking for strategies to ensure that Graylog continues to perform efficiently at scale, especially with the demands from the AI computer handling large datasets.

Specifically, I’d appreciate guidance on:

Techniques for improving query performance with large datasets.
Best retention policies for long-term analysis without losing important data.
Optimizing Graylog indexing to handle high-frequency log data from AI computers.

Has anyone had success with similar setups? Any tips or best practices you’d recommend?

Looking forward to your insights!

Arie · December 6, 2024, 1:16pm

Hi,

On 3 I have the following points as for elasticsearch 7.10.2

• 3, 6 of 9 primary shards per index, depending on your amount of data nodes
• 10 to 25GB per shard is fast search is required, else up to 50GB
• 20 shards per GB heap space
• Memory / Disk <> Disk is 1:16

When above 5 elastic Master/Data nodes on avarage it s to make a cut and
provide Master / Data nodes only and at least two masters for redundancy.

If there is high ingestion then refresh_interval could be set higher, it defaults to
1 or 5 seconds and some set this to 30 seconds. Only downside is that data is only
availeble for query afther that timestamp.

In graylog server.conf there ar some tunings even so to do.

etgraylog1 · December 12, 2024, 2:57pm

Techniques for improving query performance with large datasets.

Use Field-filters reduce the volume of data to process.
Use Pipeline Processing Rules to reduce the amount of indexed data by removing unused fields.
Avoid Search-result Highlighting.

Best retention policies for long-term analysis without losing important data.

Your requirements define ‘Best’. There are a variety of ways to approach this, Graylog Data-tiering is one option. Typically the first thing to consider is whether the data must be immediately available for queries. If not, Archiving could be another option.

Optimizing Graylog indexing to handle high-frequency log data from AI computers.

Evaluate Pipeline Processing Rules to insure they’re efficiently processing the data by avoiding complex or too many applicable rules.
Evaluate Refresh-interval of Index Sets as less frequent refreshes can increase indexing throughput at the cost of search efficiency.
Increasing shards for indices that the data will be routed to will increase indexing throughput.

Topic		Replies	Views
Assistance Required: Enhancing Graylog Efficiency for Huge Log Volumes Graylog Central (peer support)	2	56	July 3, 2024
Graylog speed performance Graylog Central (peer support)	2	380	August 10, 2020
Graylog Sizing/Optimzation Graylog Central (peer support) sidecar , nxlog , nodatanx	4	6351	December 15, 2017
Enhance Graylog search performance by adding new Elastic nodes? Graylog Central (peer support)	3	1643	August 24, 2017
Performance Tuning Whitepaper, Guide, Doc Graylog Central (peer support)	5	4794	August 8, 2017

Optimizing Graylog for AI Computer-Driven Systems

Related topics