Graylog Architecture Ideeas

cantipop · October 17, 2017, 2:00pm

Hi,
I have more than 4 GL nodes and an ES cluster.
The problem is that I have more than +100 active streams. In the beginning I had only a few streams. As I need more streams, and pushing more logs in the system I saw it needs more CPU power. I added more graylog nodes. Now I am in the situation that if I add a new GL it does not add too much value because graylog node needs to evaluate the regexes from that +100 streams.
In the beginning each node processing over 10000 msg/sec, and now its processing ~1000 msg/sec (and its not because of elasticseach. I tested with pausing all the streams but one and the ingestion rate increased.)

There is a architecture consideration for this?
I was thinking to use a common elasticsearch cluster, and use 2 or more graylog clusters independent by each other.
Each graylog cluster to use same mongo replica set, but of course different databases; also for elasticsearch each graylog cluster to use different indices set.;. In front of them to pun an Apache to with a reverse proxy to point to a graylog cluster (eg/. www.site.org/gl1/ (for gl cluster 1) and www.site.org/gl2/ (for gl cluster 2).

Any ideas or hints?

jtkarvo · October 17, 2017, 4:03pm

Not an architecture idea, but: The first thing would be trying to optimize the regexes, or trying to use matching a string instead of a regex in the stream conditions. This is sometimes very difficult, but can make a huge difference.

cantipop · October 18, 2017, 6:08am

For short time it is an option to optimize the stream rules. For long term it is not an option. I was thinking, if graylog can stop at first matching stream rule and not continue to check all the rules. Maybe if the stream can have a flag that tells the graylog to stop checking the remaining rules. And messages that come in the system should be checked first against this “flagged” streams rules. It is fantasy, or not ?

jochen · October 18, 2017, 7:33am

That’s already the case for some cases, e. g. if all rules have to match and the first rule doesn’t match, Graylog will stop evaluating rules for that stream.

Gory details:

github.com

Graylog2/graylog2-server/blob/2.3.1/graylog2-server/src/main/java/org/graylog2/streams/StreamRouterEngine.java

/**
 * This file is part of Graylog.
 *
 * Graylog is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Graylog is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Graylog.  If not, see <http://www.gnu.org/licenses/>.
 */

package org.graylog2.streams;

import com.codahale.metrics.Timer;

This file has been truncated. show original

jan · October 18, 2017, 9:25am

that is a problem that could be solved with the processing pipelines.

Mantil · October 21, 2017, 3:08am

Scaling has worked for us very well. I agree with the others here though. If stream rules are hurting you “tightening up” your regex to be more efficient or Matching pays off HUGE dividends once we took the time to do it and were able to get by with fewer nodes to do the same work. Another place that helped enormously is pre-formatting our outgoing logs from servers/devices/applications in gelf/JSON. Not needing cpu intensive regex extractors also makes a very noticable difference in graylog performance and needed compute resources. Just these few things go a long way when very high msg/sec inputs are in play.

system · November 4, 2017, 3:08am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Assistance Required: Enhancing Graylog Efficiency for Huge Log Volumes Graylog Central (peer support)	2	79	July 3, 2024
Problem with Graylog cluster Graylog Central (peer support)	4	627	December 15, 2020
Backlog on GL Nodes Graylog Central (peer support) pipeline-rules , route-to-streampl	7	1154	July 4, 2019
Performance Tuning Whitepaper, Guide, Doc Graylog Central (peer support)	5	4851	August 8, 2017
Graylog Sizing 10,000 msgs per second Graylog Central (peer support)	9	4186	May 2, 2018

Graylog Architecture Ideeas

Related topics