Graylog - Uncommited messages deleted from journal & utilization is too high

Ganeshbabu · November 15, 2017, 8:07am

Hi All,

We installed Graylog 2.3 & Elasticsearch 5.5.2 version in Ubuntu 16.04 machine and I am getting this below notifications in graylog,

Deflector exists as an index and is not an alias. (triggered 20 hours ago)
The deflector is meant to be an alias but exists as an index. Multiple failures of infrastructure can lead to this. Your messages are still indexed but searches and all maintenance tasks will fail or produce incorrect results. It is strongly recommend that you act as soon as possible.
×
 Uncommited messages deleted from journal (triggered 21 hours ago)
Some messages were deleted from the Graylog journal before they could be written to Elasticsearch. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit. (Node: 291ee918-b16c-ca1752)
×
 Journal utilization is too high (triggered a day ago)
Journal utilization is too high and may go over the limit soon. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit. (Node: 291ee918-b16c-ca1752)

There is no errors & warning messages in both graylog & elasticsearch logs.

Elasticsearch cluster health

root@Graylog:/etc/graylog# curl -XGET http://Graylog:9200/_cluster/health?pretty=true
{
  "cluster_name" : "*Graylog*",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 36,
  "active_shards" : 36,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Please kindly advice how to get resolve it?

Regards,
Ganeshbabu R

jan · November 15, 2017, 8:20am

you need to remove those message by click on the x in the top right.

But you should then investigate why those messages are triggered.

Ganeshbabu · November 15, 2017, 9:29am

@jan

But you should then investigate why those messages are triggered.

I wasn’t sure how to investigate these messages but after removing the message in the notifications I found this warning message in the server.log

2017-11-15T08:37:15.619Z WARN  [ProxiedResource] Unable to call http://graylogserver.southeastasia.cloudapp.azure.com:9000/api/system/metrics/multiple on node <291ee918-b16c-4321-b6f8-7a88a3ca1752>
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_144]
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_144]
        at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_144]
        at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_144]
        at okio.Okio$2.read(Okio.java:139) ~[graylog.jar:?]
        at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) ~[graylog.jar:?]
        at okio.RealBufferedSource.indexOf(RealBufferedSource.java:345) ~[graylog.jar:?]
        at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:217) ~[graylog.jar:?]
        at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:211) ~[graylog.jar:?]
        at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) ~[graylog.jar:?]
        at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
        at org.graylog2.rest.RemoteInterfaceProvider.lambda$get$0(RemoteInterfaceProvider.java:59) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[graylog.jar:?]
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[graylog.jar:?]
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) ~[graylog.jar:?]
        at okhttp3.RealCall.execute(RealCall.java:69) ~[graylog.jar:?]
        at retrofit2.OkHttpCall.execute(OkHttpCall.java:180) ~[graylog.jar:?]
        at org.graylog2.shared.rest.resources.ProxiedResource.lambda$getForAllNodes$0(ProxiedResource.java:76) ~[graylog.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_144]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_144]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]
2017-11-15T08:37:57.150Z WARN  [NodePingThread] Did not find meta info of this node. Re-registering.
2017-11-15T08:41:28.765Z WARN  [DefaultFilterChain] GRIZZLY0013: Exception during FilterChain execution
java.lang.OutOfMemoryError: Java heap space
2017-11-15T08:41:28.766Z ERROR [SelectorRunner] doSelect exception
java.lang.OutOfMemoryError: Java heap space
        at sun.nio.ch.Net.localInetAddress(Native Method) ~[?:1.8.0_144]
        at sun.nio.ch.Net.localAddress(Net.java:479) ~[?:1.8.0_144]
        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:133) ~[?:1.8.0_144]
        at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:266) ~[?:1.8.0_144]
        at org.glassfish.grizzly.nio.transport.TCPNIOServerConnection.doAccept(TCPNIOServerConnection.java:169) ~[graylog.jar:?]
        at org.glassfish.grizzly.nio.transport.TCPNIOServerConnection.onAccept(TCPNIOServerConnection.java:233) ~[graylog.jar:?]
        at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:508) ~[graylog.jar:?]
        at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112) ~[graylog.jar:?]
        at org.glassfish.grizzly.strategies.SameThreadIOStrategy.executeIoEvent(SameThreadIOStrategy.java:103) ~[graylog.jar:?]
        at org.glassfish.grizzly.strategies.AbstractIOStrategy.executeIoEvent(AbstractIOStrategy.java:89) ~[graylog.jar:?]
        at org.glassfish.grizzly.nio.SelectorRunner.iterateKeyEvents(SelectorRunner.java:415) ~[graylog.jar:?]
        at org.glassfish.grizzly.nio.SelectorRunner.iterateKeys(SelectorRunner.java:384) ~[graylog.jar:?]
        at org.glassfish.grizzly.nio.SelectorRunner.doSelect(SelectorRunner.java:348) [graylog.jar:?]
        at org.glassfish.grizzly.nio.SelectorRunner.run(SelectorRunner.java:279) [graylog.jar:?]
        at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:593) [graylog.jar:?]
        at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:573) [graylog.jar:?]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_144]

you need to remove those message by click on the x in the top right.

I removed those messages in the notifications and I was again trying to send some data to graylog but the below error is showing in the filebeat,

2017-11-15T09:16:33Z INFO Harvester started for file: /etc/graylog/graylog.csv
2017-11-15T09:16:52Z INFO Non-zero metrics in the last 30s: filebeat.harvester.running=1 filebeat.harvester.started=1 filebeat.harvester.open_files=1 libbeat.logstash.call_count.PublishEvents=1 libbeat.logstash.publish.write_bytes=3149 libbeat.publisher.published_events=1383
2017-11-15T09:17:03Z ERR Failed to publish events caused by: read tcp 12.0.0.5:52274->42.234.120.00:5044: i/o timeout
2017-11-15T09:17:03Z INFO Error publishing events (retrying): read tcp 12.0.0.5:52274->42.234.120.00:5044: i/o timeout
2017-11-15T09:17:22Z INFO Non-zero metrics in the last 30s: publish.events=10002 registar.states.current=1 libbeat.logstash.published_and_acked_events=6747 libbeat.logstash.published_but_not_acked_events=1383 libbeat.publisher.published_events=5364 libbeat.logstash.call_count.PublishEvents=5 libbeat.logstash.publish.read_errors=1 libbeat.logstash.publish.read_bytes=4062 libbeat.logstash.publish.write_bytes=2684556 registrar.states.update=10002 registrar.writes=5

Let me know if you need any further informations.

Thanks,
Ganeshbabu R

jan · November 15, 2017, 9:37am

@Ganeshbabu

you mix everything together and then ask a question out of that mix …

1. Error messages:

Are the error messages returning after you had removed them (by click on the x) ?

2. Collector Sidecar Delivery of Logs:

From the small Log snipped and the not really provided Information I would suggest the following:

 12.0.0.5:52274->42.234.120.00:5044

Check if the connection between the Server where Collector-Sidecar/Filebeat is running to Graylog.
Check if you have a Beats Input running on Graylog on Port 5044
Check if the Server where Collector-Sidecar/Filebeat is running is able to connect to Port 5044 of the Graylog Server.

Ganeshbabu · November 15, 2017, 9:52am

you mix everything together and then ask a question out of that mix …

@jan
My intention was not to make any confusion and I was trying to send some data to graylog through filebeat to check whether I would get the same error messages or not.

Are the error messages returning after you had removed them (by click on the x) ?

No, error messages are not there after I removed.

Deflector exists as an index and is not an alias. (triggered 20 hours ago)
The deflector is meant to be an alias but exists as an index. Multiple failures of infrastructure can lead to this. Your messages are still indexed but searches and all maintenance tasks will fail or produce incorrect results. It is strongly recommend that you act as soon as possible.
×
 Uncommited messages deleted from journal (triggered 21 hours ago)
Some messages were deleted from the Graylog journal before they could be written to Elasticsearch. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit. (Node: 291ee918-b16c-ca1752)
×
 Journal utilization is too high (triggered a day ago)
Journal utilization is too high and may go over the limit soon. Please verify that your Elasticsearch cluster is healthy and fast enough. You may also want to review your Graylog journal settings and set a higher limit. (Node: 291ee918-b16c-ca1752)

I don’t know for what reason these error message were thrown and if you could share the root cause or any documentation for reference it would be very helpful.

Regards,
Ganeshbabu R

jan · November 15, 2017, 2:58pm

sorry that we missed that in our documentation. But thank you for pointing that out - i’ll add a section in the FAQ about it ASAP.

But let me explain first. You have the settings about your journal in the the Graylog server.conf

github.com

Graylog2/graylog2-server/blob/master/misc/graylog.conf#L457-L461


# Journal hold messages before they could be written to Elasticsearch.
# For a maximum of 12 hours or 5 GB whichever happens first.
# During normal operation the journal will be smaller.
#message_journal_max_age = 12h
#message_journal_max_size = 5gb

Taking the defaults as example. If the messages in the journal are older than 12 hours it will start dropping the messages from the journal or if the size of the journal grow above 5 GB.

What you can do:

first check the Journal on your Graylog Nodes in the Graylog UI ( System > Nodes )
- check if you still have a journal that is full
raise the resources of your Elasticsearch Cluster
- analyze where the bottlenecks are
- adjust the Elasticsearch connection settings in Graylog

Ganeshbabu · November 15, 2017, 6:03pm

Thanks @jan for sharing this information and now I have the clear path from onwards. I will check the journal in my graylog node and will try troubleshooting the errors.

I will post the update soon.

Regards,
Ganeshbabu R

system · November 29, 2017, 6:03pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Journal utilization is too high - Uncommited messages deleted from journal Graylog Central (peer support)	4	15614	August 30, 2018
Messages getting updated in graylog_deflector Graylog Central (peer support)	2	673	February 23, 2018
Graylog_deflector exists as an indexer and is not an alias Graylog Central (peer support)	4	5668	November 12, 2018
Deflector exists as an index and is not an alias. Reopen Graylog Central (peer support)	5	3368	December 27, 2017
I have below error message in graylog Graylog Central (peer support)	5	658	April 2, 2019

Graylog - Uncommited messages deleted from journal & utilization is too high

1. Error messages:

2. Collector Sidecar Delivery of Logs:

Related topics