Heyo,
I just ran into the problem, that one of my Graylog Nodes shut down, because the storage system behind our virtualization platform (more exact the storage for the vm-host my Graylog was running on) became full and therefore the system was halted for some reason (I can’t get the server to start, I have to wait for the storage guys to solve their problem first, so I can’t look for problems atm.)
But my bigger concern is, that the second Graylog node also failed, after the other one stopped being reachable. So, atm, we are blind. Is this a known problem, that Graylog stops responing when another cluster member goes down unexpectedly? This is the first time this happened, every planned shutdown or service restart never caused any problems.
Has anyone experienced this problem? And if yes, any ideas why? And any idea on how to get Graylog01 running again?
Symptoms:
- Graylog02 stopped responding (Server seems to have been shut down by the hypervisor).
- Graylog01 stopped responding on the Web-Interface (the good old server not reachable screen of Graylog)
- Graylog01 stopped indexing into Elasticsearch, even into the still available elasticsearch nodes. The Node however still received the messages and wrote them into the Journal (Journal filled to ca. 250k messages, this is why I noticed the problem, I get alerted when a Journal fills up)
- I Tried restarting Graylog01 but it will not come up, it is stuck here:
2017-09-28T13:04:29.274+02:00 INFO [cluster] Monitor thread successfully connected to server with description ServerDescription{address=graylog-arbiter.example.local:27018, type=REPLICA_SET_ARBITER, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 4, 2]}, minWireVersion=0, maxWireVersion=5, maxDocumentSize=16777216, roundTripTimeNanos=9646035, setName='rewe-graylog-prod', canonicalAddress=graylog-arbiter.example.local:27018, hosts=[graylog01:27017, graylog02.example.local:27017], passives=[], arbiters=[graylog-arbiter.example.local:27018], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=5, lastWriteDate=Thu Jan 01 01:00:00 CET 1970, lastUpdateTimeNanos=18043098561357175}
2017-09-28T13:04:29.626+02:00 INFO [cluster] Exception in monitor thread while connecting to server graylog02.example.local:27017
com.mongodb.MongoSocketOpenException: Exception opening socket
at com.mongodb.connection.SocketStream.open(SocketStream.java:63) ~[graylog.jar:?]
at com.mongodb.connection.InternalStreamConnection.open(InternalStreamConnection.java:115) ~[graylog.jar:?]
at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:113) [graylog.jar:?]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_121]
Caused by: java.net.NoRouteToHostException: No route to host (Host unreachable)
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_121]
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source) ~[?:1.8.0_121]
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source) ~[?:1.8.0_121]
at java.net.AbstractPlainSocketImpl.connect(Unknown Source) ~[?:1.8.0_121]
at java.net.SocksSocketImpl.connect(Unknown Source) ~[?:1.8.0_121]
at java.net.Socket.connect(Unknown Source) ~[?:1.8.0_121]
at com.mongodb.connection.SocketStreamHelper.initialize(SocketStreamHelper.java:57) ~[graylog.jar:?]
at com.mongodb.connection.SocketStream.open(SocketStream.java:58) ~[graylog.jar:?]
... 3 more
2017-09-28T13:04:59.047+02:00 ERROR [MongoConnectionProvider] Error connecting to MongoDB: Timed out after 30000 ms while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}. Client view of cluster state is {type=REPLICA_SET, servers=[{address=graylog-arbiter.example.local:27018, type=REPLICA_SET_ARBITER, roundTripTime=1.2 ms, state=CONNECTED}, {address=graylog01:27017, type=REPLICA_SET_OTHER, roundTripTime=0.9 ms, state=CONNECTED}, {address=graylog02.example.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.NoRouteToHostException: No route to host (Host unreachable)}}]
2017-09-28T13:04:59.307+02:00 INFO [cluster] No server chosen by WritableServerSelector from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=graylog-arbiter.example.local:27018, type=REPLICA_SET_ARBITER, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 4, 2]}, minWireVersion=0, maxWireVersion=5, maxDocumentSize=16777216, roundTripTimeNanos=1159190, setName='rewe-graylog-prod', canonicalAddress=graylog-arbiter.example.local:27018, hosts=[graylog01:27017, graylog02.example.local:27017], passives=[], arbiters=[graylog-arbiter.example.local:27018], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=5, lastWriteDate=Thu Jan 01 01:00:00 CET 1970, lastUpdateTimeNanos=18043128104881863}, ServerDescription{address=graylog01:27017, type=REPLICA_SET_OTHER, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 4, 2]}, minWireVersion=0, maxWireVersion=5, maxDocumentSize=16777216, roundTripTimeNanos=877379, setName='rewe-graylog-prod', canonicalAddress=graylog01:27017, hosts=[graylog01:27017, graylog02.example.local:27017], passives=[], arbiters=[graylog-arbiter.example.local:27018], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=5, lastWriteDate=Tue Jul 04 14:28:27 CEST 2017, lastUpdateTimeNanos=18043128098131404}, ServerDescription{address=graylog02.example.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.NoRouteToHostException: No route to host (Host unreachable)}}]}. Waiting for 30000 ms before timing out
2017-09-28T13:05:29.317+02:00 INFO [cluster] No server chosen by WritableServerSelector from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=graylog-arbiter.example.local:27018, type=REPLICA_SET_ARBITER, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 4, 2]}, minWireVersion=0, maxWireVersion=5, maxDocumentSize=16777216, roundTripTimeNanos=1170321, setName='rewe-graylog-prod', canonicalAddress=graylog-arbiter.example.local:27018, hosts=[graylog01:27017, graylog02.example.local:27017], passives=[], arbiters=[graylog-arbiter.example.local:27018], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=5, lastWriteDate=Thu Jan 01 01:00:00 CET 1970, lastUpdateTimeNanos=18043158154655368}, ServerDescription{address=graylog01:27017, type=REPLICA_SET_OTHER, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 4, 2]}, minWireVersion=0, maxWireVersion=5, maxDocumentSize=16777216, roundTripTimeNanos=722206, setName='rewe-graylog-prod', canonicalAddress=graylog01:27017, hosts=[graylog01:27017, graylog02.example.local:27017], passives=[], arbiters=[graylog-arbiter.example.local:27018], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=5, lastWriteDate=Tue Jul 04 14:28:27 CEST 2017, lastUpdateTimeNanos=18043158114364905}, ServerDescription{address=graylog02.example.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.NoRouteToHostException: No route to host (Host unreachable)}}]}. Waiting for 30000 ms before timing out
Here are screenshots, when it happened (yes, I am monitoring my Graylog-Prod with my Graylog-Test environment. )
Logs: (This is repeated over and over before I restarted Graylog01)
2017-09-28T12:26:17.327+02:00 INFO [cluster] No server chosen by WritableServerSelector from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=graylog-arbiter.example.local:27018, type=REPLICA_SET_ARBITER, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 4, 2]}, minWireVersion=0, maxWireVersion=5, maxDocumentSize=16777216, roundTripTimeNanos=830087, setName='rewe-graylog-prod', canonicalAddress=graylog-arbiter.example.local:27018, hosts=[graylog01:27017, graylog02.example.local:27017], passives=[], arbiters=[graylog-arbiter.example.local:27018], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=5, lastWriteDate=Thu Jan 01 01:00:00 CET 1970, lastUpdateTimeNanos=18040806167280731}, ServerDescription{address=graylog01:27017, type=REPLICA_SET_OTHER, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 4, 2]}, minWireVersion=0, maxWireVersion=5, maxDocumentSize=16777216, roundTripTimeNanos=335873, setName='rewe-graylog-prod', canonicalAddress=graylog01:27017, hosts=[graylog01:27017, graylog02.example.local:27017], passives=[], arbiters=[graylog-arbiter.example.local:27018], primary='null', tagSet=TagSet{[]}, electionId=null, setVersion=5, lastWriteDate=Tue Jul 04 14:28:27 CEST 2017, lastUpdateTimeNanos=18040806133671295}, ServerDescription{address=graylog02.example.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.NoRouteToHostException: No route to host (Host unreachable)}}]}. Waiting for 30000 ms before timing out
2017-09-28T12:26:20.011+02:00 INFO [RetryExec] I/O exception (java.net.NoRouteToHostException) caught when processing request to {}->http://graylog02.example.local:9200: No route to host (Host unreachable)
2017-09-28T12:26:20.011+02:00 INFO [RetryExec] I/O exception (java.net.NoRouteToHostException) caught when processing request to {}->http://graylog02.example.local:9200: No route to host (Host unreachable)
2017-09-28T12:26:20.011+02:00 INFO [RetryExec] Retrying request to {}->http://graylog02.example.local:9200
2017-09-28T12:26:20.011+02:00 INFO [RetryExec] Retrying request to {}->http://graylog02.example.local:9200
2017-09-28T12:26:22.784+02:00 ERROR [IndexRotationThread] Uncaught exception in periodical
com.google.inject.ProvisionException: Unable to provision, see the following errors:
1) Error injecting constructor, com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=graylog-arbiter.example.local:27018, type=REPLICA_SET_ARBITER, roundTripTime=1.8 ms, state=CONNECTED}, {address=graylog01:27017, type=REPLICA_SET_OTHER, roundTripTime=0.4 ms, state=CONNECTED}, {address=graylog02.example.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.NoRouteToHostException: No route to host (Host unreachable)}}]
at org.graylog2.system.activities.SystemMessageServiceImpl.<init>(SystemMessageServiceImpl.java:37)
while locating org.graylog2.system.activities.SystemMessageServiceImpl
while locating org.graylog2.system.activities.SystemMessageService
for the 1st parameter of org.graylog2.system.activities.SystemMessageActivityWriter.<init>(SystemMessageActivityWriter.java:41)
while locating org.graylog2.system.activities.SystemMessageActivityWriter
while locating org.graylog2.shared.system.activities.ActivityWriter
for the 1st parameter of org.graylog2.bindings.providers.SystemJobManagerProvider.<init>(SystemJobManagerProvider.java:30)
while locating org.graylog2.bindings.providers.SystemJobManagerProvider
while locating org.graylog2.system.jobs.SystemJobManager
for the 6th parameter of org.graylog2.indexer.MongoIndexSet.<init>(MongoIndexSet.java:93)
while locating org.graylog2.indexer.MongoIndexSet annotated with @com.google.inject.internal.UniqueAnnotations$Internal(value=1)
Caused by: com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=graylog-arbiter.example.local:27018, type=REPLICA_SET_ARBITER, roundTripTime=1.8 ms, state=CONNECTED}, {address=graylog01:27017, type=REPLICA_SET_OTHER, roundTripTime=0.4 ms, state=CONNECTED}, {address=graylog02.example.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.NoRouteToHostException: No route to host (Host unreachable)}}]
at com.mongodb.connection.BaseCluster.createTimeoutException(BaseCluster.java:377)
at com.mongodb.connection.BaseCluster.selectServer(BaseCluster.java:104)
at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.<init>(ClusterBinding.java:75)
at com.mongodb.binding.ClusterBinding$ClusterBindingConnectionSource.<init>(ClusterBinding.java:71)
at com.mongodb.binding.ClusterBinding.getWriteConnectionSource(ClusterBinding.java:68)
at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:411)
at com.mongodb.operation.CreateIndexesOperation.execute(CreateIndexesOperation.java:144)
at com.mongodb.operation.CreateIndexesOperation.execute(CreateIndexesOperation.java:71)
at com.mongodb.Mongo.execute(Mongo.java:845)
at com.mongodb.Mongo$2.execute(Mongo.java:828)
at com.mongodb.DBCollection.createIndex(DBCollection.java:1618)
at com.mongodb.DBCollection.createIndex(DBCollection.java:1603)
at org.graylog2.system.activities.SystemMessageServiceImpl.<init>(SystemMessageServiceImpl.java:39)
at org.graylog2.system.activities.SystemMessageServiceImpl$$FastClassByGuice$$c6891cf2.newInstance(<generated>)
at com.google.inject.internal.DefaultConstructionProxyFactory$FastClassProxy.newInstance(DefaultConstructionProxyFactory.java:89)
at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:111)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)
at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)
at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268)
at com.google.inject.internal.BoundProviderFactory.get(BoundProviderFactory.java:61)
at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268)
at com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:1019)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1085)
at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:1015)
at com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:776)
at com.sun.proxy.$Proxy110.create(Unknown Source)
at org.graylog2.indexer.MongoIndexSetRegistry.findAllMongoIndexSets(MongoIndexSetRegistry.java:99)
at org.graylog2.indexer.MongoIndexSetRegistry.getAll(MongoIndexSetRegistry.java:107)
at org.graylog2.indexer.MongoIndexSetRegistry.iterator(MongoIndexSetRegistry.java:222)
at java.lang.Iterable.forEach(Unknown Source)
at org.graylog2.periodical.IndexRotationThread.doRun(IndexRotationThread.java:73)
at org.graylog2.plugin.periodical.Periodical.run(Periodical.java:77)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Unknown Source)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
2) Error injecting constructor, com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=graylog-arbiter.example.local:27018, type=REPLICA_SET_ARBITER, roundTripTime=0.7 ms, state=CONNECTED}, {address=graylog01:27017, type=REPLICA_SET_OTHER, roundTripTime=0.4 ms, state=CONNECTED}, {address=graylog02.example.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.NoRouteToHostException: No route to host (Host unreachable)}}]
at org.graylog2.system.activities.SystemMessageServiceImpl.<init>(SystemMessageServiceImpl.java:37)
while locating org.graylog2.system.activities.SystemMessageServiceImpl
while locating org.graylog2.system.activities.SystemMessageService
for the 1st parameter of org.graylog2.system.activities.SystemMessageActivityWriter.<init>(SystemMessageActivityWriter.java:41)
while locating org.graylog2.system.activities.SystemMessageActivityWriter
while locating org.graylog2.shared.system.activities.ActivityWriter
for the 8th parameter of org.graylog2.indexer.MongoIndexSet.<init>(MongoIndexSet.java:93)
while locating org.graylog2.indexer.MongoIndexSet annotated with @com.google.inject.internal.UniqueAnnotations$Internal(value=1)
Caused by: com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=graylog-arbiter.example.local:27018, type=REPLICA_SET_ARBITER, roundTripTime=0.7 ms, state=CONNECTED}, {address=graylog01:27017, type=REPLICA_SET_OTHER, roundTripTime=0.4 ms, state=CONNECTED}, {address=graylog02.example.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.NoRouteToHostException: No route to host (Host unreachable)}}]
2017-09-28T12:26:23.017+02:00 INFO [RetryExec] I/O exception (java.net.NoRouteToHostException) caught when processing request to {}->http://graylog02.example.local:9200: No route to host (Host unreachable)
2017-09-28T12:26:23.017+02:00 INFO [RetryExec] I/O exception (java.net.NoRouteToHostException) caught when processing request to {}->http://graylog02.example.local:9200: No route to host (Host unreachable)
2017-09-28T12:26:23.017+02:00 INFO [RetryExec] Retrying request to {}->http://graylog02.example.local:9200
2017-09-28T12:26:23.017+02:00 INFO [RetryExec] Retrying request to {}->http://graylog02.example.local:9200
2017-09-28T12:26:25.205+02:00 WARN [Errors] The following warnings have been detected: WARNING: Unknown HK2 failure detected:
MultiException stack 1 of 1
com.google.inject.ProvisionException: Unable to provision, see the following errors:
1) Error injecting constructor, com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=graylog-arbiter.example.local:27018, type=REPLICA_SET_ARBITER, roundTripTime=0.8 ms, state=CONNECTED}, {address=graylog01:27017, type=REPLICA_SET_OTHER, roundTripTime=0.4 ms, state=CONNECTED}, {address=graylog02.example.local:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.NoRouteToHostException: No route to host (Host unreachable)}}]
at org.graylog2.users.UserServiceImpl.<init>(UserServiceImpl.java:67)
while locating org.graylog2.users.UserServiceImpl
while locating org.graylog2.shared.users.UserService
[....]
I appreciate any help
Greetings - Phil
PS: Yes I know, having Graylog and Elasticsearch on the same host is not best practice, but this was not my decision ^^