Graylog 2.4.4 - stream timeout Exception/index failures

Good Morning ,
Graylog cluster 3 nodes + elasticsearch cluster + mongodb replica on one mashine
1 node run on: 8 CPU(Intel® Core™ i7-7700 CPU @ 3.60GHz), 64GB ram + 480 ssd in local network(1GBs) + Debian Linux

Speed ~1k msg/sec
One Iindex have 3 shards and 2 replicas
Index rotation strategy: Index Size
Max index size: (2.0GB)
Max number of indices: 245

In the last week I began to notice more and more often in the log file errors

2018-05-11T22:34:08.968+03:00 WARN  [StreamRouterEngine] Unexpected error during stream matching
    java.util.concurrent.TimeoutException: null
            at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:1.8.0_171]
            at com.google.common.util.concurrent.SimpleTimeLimiter.callWithTimeout(SimpleTimeLimiter.java:147) ~[graylog.jar:?]
            at org.graylog2.streams.StreamRouterEngine$Rule.matchWithTimeOut(StreamRouterEngine.java:303) [graylog.jar:?]
            at org.graylog2.streams.StreamRouterEngine$Rule.access$000(StreamRouterEngine.java:264) [graylog.jar:?]
            at org.graylog2.streams.StreamRouterEngine.match(StreamRouterEngine.java:191) [graylog.jar:?]
            at org.graylog2.streams.StreamRouter.route(StreamRouter.java:99) [graylog.jar:?]
            at org.graylog2.filters.StreamMatcherFilter.filter(StreamMatcherFilter.java:42) [graylog.jar:?]
            at org.graylog2.messageprocessors.MessageFilterChainProcessor.process(MessageFilterChainProcessor.java:100) [graylog.jar:?]
            at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.handleMessage(ProcessBufferProcessor.java:114) [graylog.jar:?]
            at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.dispatchMessage(ProcessBufferProcessor.java:100) [graylog.jar:?]
            at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:77) [graylog.jar:?]
            at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [graylog.jar:?]
            at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
            at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
            at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

[2018-05-11T22:53:38,451][WARN ][o.e.m.j.JvmGcMonitorService] [graylog] [gc][young][14432][3555] duration [1.8s], collections [1]/[2.3s], total [1.8s]/[5.4m], memory [3gb]->[2.5gb]/[30.9gb], all_pools {[young] [517.6mb]->[4.8mb]/[532.5mb]}{[survivor] [16.6mb]->[14.4mb]/[66.5mb]}{[old] [2.4gb]->[2.4gb]/[30.3gb]}
[2018-05-11T22:53:38,451][WARN ][o.e.m.j.JvmGcMonitorService] [graylog] [gc][14432] overhead, spent [1.8s] collecting in the last [2.3s]
[2018-05-11T23:03:36,968][INFO ][o.e.c.m.MetaDataCreateIndexService] [graylog] [graylog_3157] creating index, cause [api], templates [graylog-custom-mapping, graylog-internal], shards [3]/[2], mappings [message]
[2018-05-11T23:03:38,379][INFO ][o.e.c.r.a.AllocationService] [graylog] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[graylog_3157][1]] ...]).

my config graylog

is_master = false # 2x node = false; 1 node=true master
node_id_file = /etc/graylog/server/node-id
root_timezone = Europe/Berlin
plugin_dir = /usr/share/graylog-server/plugin
rest_listen_uri = htTp://10.10.1.4:9000/api/
rest_transport_uri = htTp://10.10.1.4:9000/api/
rest_enable_cors = true
trusted_proxies = 127.0.0.1/32, 10.10.1.1/32
web_listen_uri = htTp://10.10.1.4:9000/
elasticsearch_hosts = htTp://10.10.1.2:9200,htTp://10.10.1.3:9200,htTp://10.10.1.4:9200
elasticsearch_connect_timeout = 10s
elasticsearch_max_total_connections = 300
elasticsearch_max_total_connections_per_route = 300
elasticsearch_max_retries = 10
elasticsearch_discovery_enabled = true
elasticsearch_discovery_frequency = 45s
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = close
elasticsearch_shards = 3
elasticsearch_replicas = 2
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = true
allow_highlighting = true
elasticsearch_analyzer = standard
elasticsearch_index_optimization_jobs = 30
output_batch_size = 2000
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 20
outputbuffer_processors = 10
processor_wait_strategy = blocking
ring_size = 262144
inputbuffer_ring_size = 262144
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
message_journal_max_size = 5gb
lb_recognition_period_seconds = 0
stream_processing_timeout = 2000
stream_processing_max_faults = 3
alert_check_interval = 60
stale_master_timeout = 5000
mongodb_uri = mongodb://10.10.1.2,10.10.1.3,10.10.1.4/graylog?replicaSet=rs01
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
dashboard_widget_default_cache_time = 10s
proxied_requests_thread_pool_size = 32

elasticsearch

cluster.name: graylog
node.master: false # 2x node = true; 1x node = false
node.data: true # 3x node data = true
node.name: ${HOSTNAME}
path.data: /opt/elasticsearch/
path.logs: /var/log/graylog/elasticsearch/
network.host: 10.10.1.4
discovery.zen.ping.unicast.hosts: ["10.10.1.2", "10.10.1.3","10.10.1.4"]
discovery.zen.minimum_master_nodes: 2

and jvm:

-Xms31g
-Xmx31g

JVM settings

/etc/default/graylog-server

# Path to the java executable.
JAVA=/usr/bin/java

# Default Java options for heap and garbage collection.
GRAYLOG_SERVER_JAVA_OPTS="-Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow"

# Pass some extra args to graylog-server. (i.e. "-d" to enable debug mode)
GRAYLOG_SERVER_ARGS=""

# Program that will be used to wrap the graylog-server command. Useful to
# support programs like authbind.
GRAYLOG_COMMAND_WRAPPER=""

/etc/default/elasticsearch

################################
# Elasticsearch
################################

# Elasticsearch home directory
#ES_HOME=/usr/share/elasticsearch

# Elasticsearch Java path
#JAVA_HOME=

# Elasticsearch configuration directory
#CONF_DIR=/etc/elasticsearch

# Elasticsearch data directory
#DATA_DIR=/var/lib/elasticsearch

# Elasticsearch logs directory
#LOG_DIR=/var/log/elasticsearch

# Elasticsearch PID directory
#PID_DIR=/var/run/elasticsearch

# Additional Java OPTS
#ES_JAVA_OPTS=

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

################################
# Elasticsearch service
################################

# SysV init.d
#
# When executing the init script, this user will be used to run the elasticsearch service.
# The default value is 'elasticsearch' and is declared in the init.d file.
# Note that this setting is only used by the init script. If changed, make sure that
# the configured user can read and write into the data, work, plugins and log directories.
# For systemd service, the user is usually configured in file /usr/lib/systemd/system/elasticsearch.service
#ES_USER=elasticsearch
#ES_GROUP=elasticsearch

# The number of seconds to wait before checking if Elasticsearch started successfully as a daemon process
ES_STARTUP_SLEEP_TIME=5

################################
# System properties
################################

# Specifies the maximum file descriptor number that can be opened by this process
# When using Systemd, this setting is ignored and the LimitNOFILE defined in
# /usr/lib/systemd/system/elasticsearch.service takes precedence
#MAX_OPEN_FILES=65536

# The maximum number of bytes of memory that may be locked into RAM
# Set to "unlimited" if you use the 'bootstrap.memory_lock: true' option
# in elasticsearch.yml.
# When using systemd, LimitMEMLOCK must be set in a unit file such as
# /etc/systemd/system/elasticsearch.service.d/override.conf.
#MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
# When using Systemd, this setting is ignored and the 'vm.max_map_count'
# property is set at boot time in /usr/lib/sysctl.d/elasticsearch.conf
#MAX_MAP_COUNT=262144

JVM Options elasticsearch jvm.options

## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms31g
-Xmx31g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

## optimizations

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# force the server VM (remove on 32-bit client JVMs)
-server

# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}

## GC logging

#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime

# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}

# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M

# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true

Errors more and more often. In what side to dig? Thanks

What are the JVM settings of Graylog on these machines?
:arrow_right: http://docs.graylog.org/en/2.4/pages/configuration/file_location.html

On 3x nodes we have

/etc/default/graylog-server

# Path to the java executable.
JAVA=/usr/bin/java

# Default Java options for heap and garbage collection.
GRAYLOG_SERVER_JAVA_OPTS="-Xms5g -Xmx5g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow"

# Pass some extra args to graylog-server. (i.e. "-d" to enable debug mode)
GRAYLOG_SERVER_ARGS=""

# Program that will be used to wrap the graylog-server command. Useful to
# support programs like authbind.
GRAYLOG_COMMAND_WRAPPER=""

/etc/default/elasticsearch

################################
# Elasticsearch
################################

# Elasticsearch home directory
#ES_HOME=/usr/share/elasticsearch

# Elasticsearch Java path
#JAVA_HOME=

# Elasticsearch configuration directory
#CONF_DIR=/etc/elasticsearch

# Elasticsearch data directory
#DATA_DIR=/var/lib/elasticsearch

# Elasticsearch logs directory
#LOG_DIR=/var/log/elasticsearch

# Elasticsearch PID directory
#PID_DIR=/var/run/elasticsearch

# Additional Java OPTS
#ES_JAVA_OPTS=

# Configure restart on package upgrade (true, every other setting will lead to not restarting)
#RESTART_ON_UPGRADE=true

################################
# Elasticsearch service
################################

# SysV init.d
#
# When executing the init script, this user will be used to run the elasticsearch service.
# The default value is 'elasticsearch' and is declared in the init.d file.
# Note that this setting is only used by the init script. If changed, make sure that
# the configured user can read and write into the data, work, plugins and log directories.
# For systemd service, the user is usually configured in file /usr/lib/systemd/system/elasticsearch.service
#ES_USER=elasticsearch
#ES_GROUP=elasticsearch

# The number of seconds to wait before checking if Elasticsearch started successfully as a daemon process
ES_STARTUP_SLEEP_TIME=5

################################
# System properties
################################

# Specifies the maximum file descriptor number that can be opened by this process
# When using Systemd, this setting is ignored and the LimitNOFILE defined in
# /usr/lib/systemd/system/elasticsearch.service takes precedence
#MAX_OPEN_FILES=65536

# The maximum number of bytes of memory that may be locked into RAM
# Set to "unlimited" if you use the 'bootstrap.memory_lock: true' option
# in elasticsearch.yml.
# When using systemd, LimitMEMLOCK must be set in a unit file such as
# /etc/systemd/system/elasticsearch.service.d/override.conf.
#MAX_LOCKED_MEMORY=unlimited

# Maximum number of VMA (Virtual Memory Areas) a process can own
# When using Systemd, this setting is ignored and the 'vm.max_map_count'
# property is set at boot time in /usr/lib/sysctl.d/elasticsearch.conf
#MAX_MAP_COUNT=262144

do you have any metrics available - maybe you hit the wall regarding your sharding ( https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster ).

As Graylog and Elasticsearch fight for the same system ressources you might put to much on the single machines. With some internal metrics you might be able to spot that.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.