We installed Graylog 2.3 & Elasticsearch 5.5.2 version in Ubuntu 16.04 machine and I am getting this below notifications in graylog,
Deflector exists as an index and is not an alias. (triggered 20 hours ago)
The deflector is meant to be an alias but exists as an index. Multiple failures of infrastructure can lead to this. Your messages are still indexed but searches and all maintenance tasks will fail or produce incorrect results. It is strongly recommend that you act as soon as possible.
×
Uncommited messages deleted from journal (triggered 21 hours ago)
Some messages were deleted from the Graylog journal before they could be written to Elasticsearch. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit. (Node: 291ee918-b16c-ca1752)
×
Journal utilization is too high (triggered a day ago)
Journal utilization is too high and may go over the limit soon. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit. (Node: 291ee918-b16c-ca1752)
There is no errors & warning messages in both graylog & elasticsearch logs.
But you should then investigate why those messages are triggered.
I wasn’t sure how to investigate these messages but after removing the message in the notifications I found this warning message in the server.log
2017-11-15T08:37:15.619Z WARN [ProxiedResource] Unable to call http://graylogserver.southeastasia.cloudapp.azure.com:9000/api/system/metrics/multiple on node <291ee918-b16c-4321-b6f8-7a88a3ca1752>
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_144]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_144]
at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_144]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_144]
at okio.Okio$2.read(Okio.java:139) ~[graylog.jar:?]
at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) ~[graylog.jar:?]
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:345) ~[graylog.jar:?]
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:217) ~[graylog.jar:?]
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:211) ~[graylog.jar:?]
at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) ~[graylog.jar:?]
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
at org.graylog2.rest.RemoteInterfaceProvider.lambda$get$0(RemoteInterfaceProvider.java:59) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) ~[graylog.jar:?]
at okhttp3.RealCall.execute(RealCall.java:69) ~[graylog.jar:?]
at retrofit2.OkHttpCall.execute(OkHttpCall.java:180) ~[graylog.jar:?]
at org.graylog2.shared.rest.resources.ProxiedResource.lambda$getForAllNodes$0(ProxiedResource.java:76) ~[graylog.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
2017-11-15T08:37:57.150Z WARN [NodePingThread] Did not find meta info of this node. Re-registering.
2017-11-15T08:41:28.765Z WARN [DefaultFilterChain] GRIZZLY0013: Exception during FilterChain execution
java.lang.OutOfMemoryError: Java heap space
2017-11-15T08:41:28.766Z ERROR [SelectorRunner] doSelect exception
java.lang.OutOfMemoryError: Java heap space
at sun.nio.ch.Net.localInetAddress(Native Method) ~[?:1.8.0_144]
at sun.nio.ch.Net.localAddress(Net.java:479) ~[?:1.8.0_144]
at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:133) ~[?:1.8.0_144]
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:266) ~[?:1.8.0_144]
at org.glassfish.grizzly.nio.transport.TCPNIOServerConnection.doAccept(TCPNIOServerConnection.java:169) ~[graylog.jar:?]
at org.glassfish.grizzly.nio.transport.TCPNIOServerConnection.onAccept(TCPNIOServerConnection.java:233) ~[graylog.jar:?]
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:508) ~[graylog.jar:?]
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112) ~[graylog.jar:?]
at org.glassfish.grizzly.strategies.SameThreadIOStrategy.executeIoEvent(SameThreadIOStrategy.java:103) ~[graylog.jar:?]
at org.glassfish.grizzly.strategies.AbstractIOStrategy.executeIoEvent(AbstractIOStrategy.java:89) ~[graylog.jar:?]
at org.glassfish.grizzly.nio.SelectorRunner.iterateKeyEvents(SelectorRunner.java:415) ~[graylog.jar:?]
at org.glassfish.grizzly.nio.SelectorRunner.iterateKeys(SelectorRunner.java:384) ~[graylog.jar:?]
at org.glassfish.grizzly.nio.SelectorRunner.doSelect(SelectorRunner.java:348) [graylog.jar:?]
at org.glassfish.grizzly.nio.SelectorRunner.run(SelectorRunner.java:279) [graylog.jar:?]
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:593) [graylog.jar:?]
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:573) [graylog.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
you need to remove those message by click on the x in the top right.
I removed those messages in the notifications and I was again trying to send some data to graylog but the below error is showing in the filebeat,
2017-11-15T09:16:33Z INFO Harvester started for file: /etc/graylog/graylog.csv
2017-11-15T09:16:52Z INFO Non-zero metrics in the last 30s: filebeat.harvester.running=1 filebeat.harvester.started=1 filebeat.harvester.open_files=1 libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.write_bytes=3149 libbeat.publisher.published_events=1383
2017-11-15T09:17:03Z ERR Failed to publish events caused by: read tcp 12.0.0.5:52274->42.234.120.00:5044: i/o timeout
2017-11-15T09:17:03Z INFO Error publishing events (retrying): read tcp 12.0.0.5:52274->42.234.120.00:5044: i/o timeout
2017-11-15T09:17:22Z INFO Non-zero metrics in the last 30s: publish.events=10002 registar.states.current=1 libbeat.logstash.published_and_acked_events=6747 libbeat.logstash.published_but_not_acked_events=1383 libbeat.publisher.published_events=5364 libbeat.logstash.call_count.PublishEvents=5 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.read_bytes=4062 libbeat.logstash.publish.write_bytes=2684556 registrar.states.update=10002 registrar.writes=5
you mix everything together and then ask a question out of that mix …
@jan
My intention was not to make any confusion and I was trying to send some data to graylog through filebeat to check whether I would get the same error messages or not.
Are the error messages returning after you had removed them (by click on the x) ?
No, error messages are not there after I removed.
Deflector exists as an index and is not an alias. (triggered 20 hours ago)
The deflector is meant to be an alias but exists as an index. Multiple failures of infrastructure can lead to this. Your messages are still indexed but searches and all maintenance tasks will fail or produce incorrect results. It is strongly recommend that you act as soon as possible.
×
Uncommited messages deleted from journal (triggered 21 hours ago)
Some messages were deleted from the Graylog journal before they could be written to Elasticsearch. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit. (Node: 291ee918-b16c-ca1752)
×
Journal utilization is too high (triggered a day ago)
Journal utilization is too high and may go over the limit soon. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit. (Node: 291ee918-b16c-ca1752)
I don’t know for what reason these error message were thrown and if you could share the root cause or any documentation for reference it would be very helpful.
sorry that we missed that in our documentation. But thank you for pointing that out - i’ll add a section in the FAQ about it ASAP.
But let me explain first. You have the settings about your journal in the the Graylog server.conf
Taking the defaults as example. If the messages in the journal are older than 12 hours it will start dropping the messages from the journal or if the size of the journal grow above 5 GB.
What you can do:
first check the Journal on your Graylog Nodes in the Graylog UI ( System > Nodes )
check if you still have a journal that is full
raise the resources of your Elasticsearch Cluster
analyze where the bottlenecks are
adjust the Elasticsearch connection settings in Graylog
Thanks @jan for sharing this information and now I have the clear path from onwards. I will check the journal in my graylog node and will try troubleshooting the errors.