Searched around the forum and the web, and found nothing.
For some reason I can’t find, Graylog just won’t connect to all green ES servers.
Scenario: 3 graylog+mongo servers (10.0.2.131-133), 5 ES servers (.134-138)
131 = master
elasticsearch_hosts = http://10.0.2.134:9301,http://10.0.2.135:9301,http://10.0.2.136:9301,http://10.0.2.137:9301,http://10.0.2.138:9301
log goes like this…
2017-08-17T18:44:34.748-03:00 INFO [CmdLineTool] Loaded plugin: Elastic Beats Input 2.3.0 [org.graylog.plugins.beats.BeatsInputPlugin]
2017-08-17T18:44:34.751-03:00 INFO [CmdLineTool] Loaded plugin: Collector 2.3.0 [org.graylog.plugins.collector.CollectorPlugin]
2017-08-17T18:44:34.752-03:00 INFO [CmdLineTool] Loaded plugin: Enterprise Integration Plugin 2.3.0 [org.graylog.plugins.enterprise_integration.EnterpriseIntegrationPlugin]
2017-08-17T18:44:34.753-03:00 INFO [CmdLineTool] Loaded plugin: Internal Logs plugin 1.0.0 [org.graylog.plugins.internallogs.InternalLogsInputPlugin]
2017-08-17T18:44:34.754-03:00 INFO [CmdLineTool] Loaded plugin: MapWidgetPlugin 2.3.0 [org.graylog.plugins.map.MapWidgetPlugin]
2017-08-17T18:44:34.765-03:00 INFO [CmdLineTool] Loaded plugin: Pipeline Processor Plugin 2.3.0 [org.graylog.plugins.pipelineprocessor.ProcessorPlugin]
2017-08-17T18:44:34.766-03:00 INFO [CmdLineTool] Loaded plugin: Threat Intelligence Plugin 0.10.0 [org.graylog.plugins.threatintel.ThreatIntelPlugin]
2017-08-17T18:44:34.767-03:00 INFO [CmdLineTool] Loaded plugin: Anonymous Usage Statistics 2.3.0 [org.graylog.plugins.usagestatistics.UsageStatsPlugin]
2017-08-17T18:44:35.071-03:00 INFO [CmdLineTool] Running with JVM arguments: -Xms1g -Xmx1g -XX:NewRatio=1 -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -Dlog4j.configurationFile=file:///etc/graylog/server/log4j2.xml -Djava.library.path=/usr/share/graylog-server/lib/sigar -Dgraylog2.installation_source=rpm
2017-08-17T18:44:35.346-03:00 INFO [Version] HV000001: Hibernate Validator null
2017-08-17T18:44:38.502-03:00 INFO [InputBufferImpl] Message journal is enabled.
2017-08-17T18:44:38.534-03:00 INFO [NodeId] Node ID: 59a000f0-dba8-46f2-bd0a-c39457e8b82d
2017-08-17T18:44:38.817-03:00 INFO [LogManager] Loading logs.
2017-08-17T18:44:38.943-03:00 INFO [LogManager] Logs loading complete.
2017-08-17T18:44:38.944-03:00 INFO [KafkaJournal] Initialized Kafka based journal at /var/lib/graylog-server/journal
2017-08-17T18:44:38.967-03:00 INFO [InputBufferImpl] Initialized InputBufferImpl with ring size <65536> and wait strategy <BlockingWaitStrategy>, running 2 parallel message handlers.
2017-08-17T18:44:38.995-03:00 INFO [cluster] Cluster created with settings {hosts=[10.0.2.131:27017, 10.0.2.132:27017, 10.0.2.133:27017], mode=MULTIPLE, requiredClusterType=REPLICA_SET, serverSelectionTimeout='30000 ms', maxWaitQueueSize=5000, requiredReplicaSetName='rs01'}
2017-08-17T18:44:38.996-03:00 INFO [cluster] Adding discovered server 10.0.2.131:27017 to client view of cluster
2017-08-17T18:44:39.037-03:00 INFO [cluster] Adding discovered server 10.0.2.132:27017 to client view of cluster
2017-08-17T18:44:39.041-03:00 INFO [cluster] Adding discovered server 10.0.2.133:27017 to client view of cluster
2017-08-17T18:44:39.065-03:00 INFO [cluster] No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=10.0.2.133:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=10.0.2.132:27017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=10.0.2.131:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
2017-08-17T18:44:39.240-03:00 INFO [connection] Opened connection [connectionId{localValue:1, serverValue:121}] to 10.0.2.131:27017
2017-08-17T18:44:39.247-03:00 INFO [cluster] Monitor thread successfully connected to server with description ServerDescription{address=10.0.2.131:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 16]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=1165647, setName='rs01', canonicalAddress=10.0.2.131:27017, hosts=[10.0.2.131:27017, 10.0.2.133:27017, 10.0.2.132:27017], passives=[], arbiters=[], primary='10.0.2.131:27017', tagSet=TagSet{[]}, electionId=7fffffff000000000000000c, setVersion=1, lastWriteDate=null, lastUpdateTimeNanos=2961744820071}
2017-08-17T18:44:39.252-03:00 INFO [cluster] Setting max election id to 7fffffff000000000000000c from replica set primary 10.0.2.131:27017
2017-08-17T18:44:39.253-03:00 INFO [cluster] Setting max set version to 1 from replica set primary 10.0.2.131:27017
2017-08-17T18:44:39.253-03:00 INFO [cluster] Discovered replica set primary 10.0.2.131:27017
2017-08-17T18:44:39.263-03:00 INFO [connection] Opened connection [connectionId{localValue:2, serverValue:43}] to 10.0.2.133:27017
2017-08-17T18:44:39.265-03:00 INFO [cluster] Monitor thread successfully connected to server with description ServerDescription{address=10.0.2.133:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 16]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=865460, setName='rs01', canonicalAddress=10.0.2.133:27017, hosts=[10.0.2.131:27017, 10.0.2.133:27017, 10.0.2.132:27017], passives=[], arbiters=[], primary='10.0.2.131:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1, lastWriteDate=null, lastUpdateTimeNanos=2961762750658}
2017-08-17T18:44:39.265-03:00 INFO [connection] Opened connection [connectionId{localValue:3, serverValue:289}] to 10.0.2.132:27017
2017-08-17T18:44:39.268-03:00 INFO [cluster] Monitor thread successfully connected to server with description ServerDescription{address=10.0.2.132:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 16]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=2098142, setName='rs01', canonicalAddress=10.0.2.132:27017, hosts=[10.0.2.131:27017, 10.0.2.133:27017, 10.0.2.132:27017], passives=[], arbiters=[], primary='10.0.2.131:27017', tagSet=TagSet{[]}, electionId=null, setVersion=1, lastWriteDate=null, lastUpdateTimeNanos=2961766060376}
2017-08-17T18:44:39.291-03:00 INFO [connection] Opened connection [connectionId{localValue:4, serverValue:122}] to 10.0.2.131:27017
2017-08-17T18:44:39.692-03:00 INFO [AbstractJestClient] Setting server pool to a list of 5 servers: [http://10.0.2.134:9301,http://10.0.2.135:9301,http://10.0.2.136:9301,http://10.0.2.137:9301,http://10.0.2.138:9301]
2017-08-17T18:44:39.693-03:00 INFO [JestClientFactory] Using multi thread/connection supporting pooling connection manager
2017-08-17T18:44:39.783-03:00 INFO [JestClientFactory] Using custom ObjectMapper instance
2017-08-17T18:44:39.783-03:00 INFO [JestClientFactory] Node Discovery disabled...
2017-08-17T18:44:39.783-03:00 INFO [JestClientFactory] Idle connection reaping disabled...
2017-08-17T18:44:40.098-03:00 INFO [ProcessBuffer] Initialized ProcessBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2017-08-17T18:44:42.103-03:00 INFO [RulesEngineProvider] No static rules file loaded.
2017-08-17T18:44:42.350-03:00 INFO [TorExitNodeLookupProvider] Refreshing internal table of known Tor exit nodes.
2017-08-17T18:44:42.383-03:00 INFO [connection] Opened connection [connectionId{localValue:5, serverValue:123}] to 10.0.2.131:27017
2017-08-17T18:44:42.410-03:00 INFO [connection] Opened connection [connectionId{localValue:6, serverValue:124}] to 10.0.2.131:27017
2017-08-17T18:44:44.654-03:00 INFO [SpamhausIpLookupProvider] Refreshing internal table of Spamhaus drop list IPs.
2017-08-17T18:44:45.133-03:00 INFO [AbuseChRansomLookupProvider] Refreshing internal table of Abuse.ch Ransomware tracker data.
2017-08-17T18:44:56.103-03:00 WARN [GeoIpResolverEngine] GeoIP database file does not exist: /etc/graylog/server/GeoLite2-City.mmdb
2017-08-17T18:44:56.117-03:00 INFO [OutputBuffer] Initialized OutputBuffer with ring size <65536> and wait strategy <BlockingWaitStrategy>.
2017-08-17T18:44:56.154-03:00 WARN [GeoIpResolverEngine] GeoIP database file does not exist: /etc/graylog/server/GeoLite2-City.mmdb
2017-08-17T18:44:56.188-03:00 WARN [GeoIpResolverEngine] GeoIP database file does not exist: /etc/graylog/server/GeoLite2-City.mmdb
2017-08-17T18:44:56.224-03:00 WARN [GeoIpResolverEngine] GeoIP database file does not exist: /etc/graylog/server/GeoLite2-City.mmdb
2017-08-17T18:44:56.259-03:00 WARN [GeoIpResolverEngine] GeoIP database file does not exist: /etc/graylog/server/GeoLite2-City.mmdb
2017-08-17T18:44:56.550-03:00 INFO [ServerBootstrap] Graylog server 2.3.0+81f8228 starting up
2017-08-17T18:44:56.552-03:00 INFO [ServerBootstrap] JRE: Oracle Corporation 1.8.0_141 on Linux 3.10.0-514.26.2.el7.x86_64
2017-08-17T18:44:56.552-03:00 INFO [ServerBootstrap] Deployment: rpm
2017-08-17T18:44:56.552-03:00 INFO [ServerBootstrap] OS: CentOS Linux 7 (Core) (centos)
2017-08-17T18:44:56.552-03:00 INFO [ServerBootstrap] Arch: amd64
2017-08-17T18:44:56.556-03:00 WARN [DeadEventLoggingListener] Received unhandled event of type <org.graylog2.plugin.lifecycles.Lifecycle> from event bus <AsyncEventBus{graylog-eventbus}>
2017-08-17T18:44:56.595-03:00 INFO [PeriodicalsService] Starting 26 periodicals ...
2017-08-17T18:44:56.596-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.ThroughputCalculator] periodical in [0s], polling every [1s].
2017-08-17T18:44:56.607-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.AlertScannerThread] periodical in [10s], polling every [60s].
2017-08-17T18:44:56.608-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.BatchedElasticSearchOutputFlushThread] periodical in [0s], polling every [1s].
2017-08-17T18:44:56.610-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.ClusterHealthCheckThread] periodical in [120s], polling every [20s].
2017-08-17T18:44:56.612-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.ContentPackLoaderPeriodical] periodical, running forever.
2017-08-17T18:44:56.613-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.GarbageCollectionWarningThread] periodical, running forever.
2017-08-17T18:44:56.627-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexerClusterCheckerThread] periodical in [0s], polling every [30s].
2017-08-17T18:44:56.629-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexRetentionThread] periodical in [0s], polling every [300s].
2017-08-17T18:44:56.630-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexRotationThread] periodical in [0s], polling every [10s].
2017-08-17T18:44:56.634-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.NodePingThread] periodical in [0s], polling every [1s].
2017-08-17T18:44:56.641-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.VersionCheckThread] periodical in [300s], polling every [1800s].
2017-08-17T18:44:56.647-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.ThrottleStateUpdaterThread] periodical in [1s], polling every [1s].
2017-08-17T18:44:56.656-03:00 INFO [Periodicals] Starting [org.graylog2.events.ClusterEventPeriodical] periodical in [0s], polling every [1s].
2017-08-17T18:44:56.689-03:00 INFO [Periodicals] Starting [org.graylog2.events.ClusterEventCleanupPeriodical] periodical in [0s], polling every [86400s].
2017-08-17T18:44:56.689-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.ClusterIdGeneratorPeriodical] periodical, running forever.
2017-08-17T18:44:56.690-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexRangesMigrationPeriodical] periodical, running forever.
2017-08-17T18:44:56.694-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexRangesCleanupPeriodical] periodical in [15s], polling every [3600s].
2017-08-17T18:44:56.708-03:00 INFO [PeriodicalsService] Not starting [org.graylog2.periodical.UserPermissionMigrationPeriodical] periodical. Not configured to run on this node.
2017-08-17T18:44:56.708-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.AlarmCallbacksMigrationPeriodical] periodical, running forever.
2017-08-17T18:44:56.726-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.ConfigurationManagementPeriodical] periodical, running forever.
2017-08-17T18:44:56.785-03:00 INFO [connection] Opened connection [connectionId{localValue:7, serverValue:125}] to 10.0.2.131:27017
2017-08-17T18:44:56.810-03:00 INFO [PeriodicalsService] Not starting [org.graylog2.periodical.LdapGroupMappingMigration] periodical. Not configured to run on this node.
2017-08-17T18:44:56.811-03:00 INFO [Periodicals] Starting [org.graylog2.periodical.IndexFailuresPeriodical] periodical, running forever.
2017-08-17T18:44:56.817-03:00 ERROR [Cluster] Couldn't read cluster health for indices [graylog_*] (n/a)
2017-08-17T18:44:56.818-03:00 INFO [IndexerClusterCheckerThread] Indexer not fully initialized yet. Skipping periodic cluster check.
2017-08-17T18:44:56.819-03:00 INFO [Periodicals] Starting [org.graylog.plugins.usagestatistics.UsageStatsNodePeriodical] periodical in [300s], polling every [21600s].
2017-08-17T18:44:56.819-03:00 INFO [Periodicals] Starting [org.graylog.plugins.usagestatistics.UsageStatsClusterPeriodical] periodical in [300s], polling every [21600s].
2017-08-17T18:44:56.835-03:00 INFO [IndexRetentionThread] Elasticsearch cluster not available, skipping index retention checks.
2017-08-17T18:44:56.873-03:00 INFO [Periodicals] Starting [org.graylog.plugins.pipelineprocessor.periodical.LegacyDefaultStreamMigration] periodical, running forever.
2017-08-17T18:44:56.890-03:00 INFO [Periodicals] Starting [org.graylog.plugins.collector.periodical.PurgeExpiredCollectorsThread] periodical in [0s], polling every [3600s].
2017-08-17T18:44:56.910-03:00 INFO [LegacyDefaultStreamMigration] Legacy default stream has no connections, no migration needed.
2017-08-17T18:44:57.090-03:00 INFO [V20161130141500_DefaultStreamRecalcIndexRanges] Cluster not connected yet, delaying migration until it is reachable.
2017-08-17T18:44:57.281-03:00 INFO [JerseyService] Enabling CORS for HTTP endpoint
2017-08-17T18:45:11.706-03:00 INFO [IndexRangesCleanupPeriodical] Skipping index range cleanup because the Elasticsearch cluster is unreachable or unhealthy
2017-08-17T18:45:11.721-03:00 INFO [NetworkListener] Started listener bound to [10.0.2.131:9000]
2017-08-17T18:45:11.723-03:00 INFO [HttpServer] [HttpServer] Started.
2017-08-17T18:45:11.723-03:00 INFO [JerseyService] Started REST API at <http://10.0.2.131:9000/api/>
2017-08-17T18:45:11.724-03:00 INFO [JerseyService] Started Web Interface at <http://10.0.2.131:9000/>
2017-08-17T18:45:11.725-03:00 INFO [ServiceManagerListener] Services are healthy
2017-08-17T18:45:11.726-03:00 INFO [InputSetupService] Triggering launching persisted inputs, node transitioned from Uninitialized [LB:DEAD] to Running [LB:ALIVE]
2017-08-17T18:45:11.726-03:00 INFO [ServerBootstrap] Services started, startup times in ms: {BufferSynchronizerService [RUNNING]=42, KafkaJournal [RUNNING]=43, OutputSetupService [RUNNING]=45, ConfigurationEtagService [RUNNING]=104, LookupTableService [RUNNING]=201, StreamCacheService [RUNNING]=246, JournalReader [RUNNING]=263, InputSetupService [RUNNING]=270, PeriodicalsService [RUNNING]=317, JerseyService [RUNNING]=15130}
2017-08-17T18:45:11.729-03:00 INFO [ServerBootstrap] Graylog server up and running.
2017-08-17T18:45:11.775-03:00 INFO [InputStateListener] Input [Beats/5932ee8fc0cd40062f427171] is now STARTING
2017-08-17T18:45:11.778-03:00 INFO [InputStateListener] Input [Internal Logs/5933517cc0cd400a8cafe2f6] is now STARTING
2017-08-17T18:45:11.813-03:00 INFO [KafkaJournal] Read offset 593939 before start of log at 661593, starting to read from the beginning of the journal.
2017-08-17T18:45:11.846-03:00 INFO [InputStateListener] Input [Internal Logs/5933517cc0cd400a8cafe2f6] is now RUNNING
2017-08-17T18:45:11.844-03:00 WARN [NettyTransport] receiveBufferSize (SO_RCVBUF) for input GELFTCPInput{title=gelf-input, type=org.graylog2.inputs.gelf.tcp.GELFTCPInput, nodeId=null} should be 1048576 but is 212992.
2017-08-17T18:45:11.844-03:00 WARN [NettyTransport] receiveBufferSize (SO_RCVBUF) for input BeatsInput{title=beats-input, type=org.graylog.plugins.beats.BeatsInput, nodeId=null} should be 1048576 but is 212992.
2017-08-17T18:45:11.854-03:00 INFO [InputStateListener] Input [GELF TCP/5983254d6bdd250acb2b00fd] is now STARTING
2017-08-17T18:45:11.878-03:00 INFO [InputStateListener] Input [Beats/5932ee8fc0cd40062f427171] is now RUNNING
2017-08-17T18:45:11.884-03:00 INFO [InputStateListener] Input [GELF TCP/5983254d6bdd250acb2b00fd] is now RUNNING
2017-08-17T18:45:11.952-03:00 WARN [ProcessBuffer] Unable to process event MessageEvent{raw=null, message=null, messages=null}, sequence 4
java.lang.NoSuchMethodError: org.apache.logging.log4j.core.impl.ThrowableProxy.getExtendedStackTraceAsString()Ljava/lang/String;
at org.graylog.plugins.internallogs.codec.SerializedLogEventCodec.processLogEvent(SerializedLogEventCodec.java:163) ~[?:?]
at org.graylog.plugins.internallogs.codec.SerializedLogEventCodec.decode(SerializedLogEventCodec.java:104) ~[?:?]
at org.graylog2.shared.buffers.processors.DecodingProcessor.processMessage(DecodingProcessor.java:146) ~[graylog.jar:?]
at org.graylog2.shared.buffers.processors.DecodingProcessor.onEvent(DecodingProcessor.java:87) ~[graylog.jar:?]
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:74) ~[graylog.jar:?]
at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) ~[graylog.jar:?]
at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
and then keeps repeating that WARN…
From .131:
curl -XGET "10.0.2.134:9200/_cluster/state?pretty" 2>/dev/null | grep "transport_address" | sort -n
"transport_address" : "10.0.2.134:9301",
"transport_address" : "10.0.2.135:9301",
"transport_address" : "10.0.2.136:9301",
"transport_address" : "10.0.2.137:9301",
"transport_address" : "10.0.2.138:9301",
curl '10.0.2.138:9200/_cat/indices?v'
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open graylog_1 4 1 17979622 0 23.5gb 11.7gb
green open graylog_0 4 1 19086275 0 21.8gb 10.9gb
green open graylog_5 4 1 3233 0 6.3mb 3.1mb
green open graylog_4 4 1 8563362 0 11.8gb 5.9gb
green open graylog_3 4 1 16227972 0 21.8gb 10.9gb
green open graylog_2 4 1 6873772 0 8.6gb 4.3gb
curl -XGET '10.0.2.134:9200/_cluster/health?pretty'
{
"cluster_name" : "graylog",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 5,
"active_primary_shards" : 24,
"active_shards" : 48,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}