Incoming is good but outgoing is very slow

Tafsir_Alam · September 19, 2018, 2:52pm

I can see on some of my data nodes that incoming of messages is good but outgoing is very slow. Because of this backlog is there on my data node.

Will you please help me to know why this is happening.

jan · September 19, 2018, 3:30pm

I just guess you need more ressources on Elasticsearch to keep up with the ingested messages.

Tafsir_Alam · September 19, 2018, 4:08pm

which type of resources @jan

jan · September 19, 2018, 4:41pm

Just more computing power to the elasticsearch cluster - it is not fast enought to process and store the messages. Without any additional information it can only be a guess.

macko003 · September 20, 2018, 7:59pm

Check your elasticsearch server(s) for bottleneck.
First the disk, after the CPU.
Usually it can’t write out the data, or can’t process it.
You can also check the memory usage. Not just the OS memory, the Elastic heap usage too.
Or maybe your network can’t transfer the data to ES servers.

markuchi · September 21, 2018, 4:06am

If you are sure that your servers have sufficient resources what we found was the following settings helped ensure the output could keep up.

ring_size
outputbuffer_processors
processbuffer_processors
output_batch_size

Make sure you read up on the documentation around these before adjusting.
Output batch size going to large seems to cause some instability. Doubling the default was fine for us but when going much higher would get much more throughput but had instability with greylog.

Also we increased our journal size to handle cases where there were prolonged influx of data or ES had to go down for awhile.

Tafsir_Alam · September 21, 2018, 5:52am

ok will check @macko003

Heap usage is 22.8 GB - Is this sufficient to handle the messages … ?

Thanks

Tafsir_Alam · September 21, 2018, 5:54am

Hi @markuchi,

Where can I find these settings…?

Tafsir_Alam · September 21, 2018, 5:57am

@markuchi @macko003 @jan

The recent pic shows the backlog on my data node. Outgoing of messages is very slow.

when I saw the server log I found this

2018-09-20T19:00:37.503Z ERROR [SystemJobManager] Unhandled error while running SystemJob <1414ccf0-bcff-11e8-a932-1244b83420c4> [org.graylog2.indexer.indices.jobs.OptimizeIndexJob]
org.graylog2.indexer.ElasticsearchException: Couldn't force merge index graylog_550
	at org.graylog2.indexer.cluster.jest.JestUtils.execute(JestUtils.java:51) ~[graylog.jar:?]
	at org.graylog2.indexer.indices.Indices.optimizeIndex(Indices.java:629) ~[graylog.jar:?]
	at org.graylog2.indexer.indices.jobs.OptimizeIndexJob.execute(OptimizeIndexJob.java:71) ~[graylog.jar:?]
	at org.graylog2.system.jobs.SystemJobManager$1.run(SystemJobManager.java:89) [graylog.jar:?]
	at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:235) [graylog.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_141]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_141]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_141]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_141]
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_141]
	at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_141]
	at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_141]
	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[graylog.jar:?]
	at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) ~[graylog.jar:?]
	at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) ~[graylog.jar:?]
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) ~[graylog.jar:?]
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) ~[graylog.jar:?]
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) ~[graylog.jar:?]
	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) ~[graylog.jar:?]
	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) ~[graylog.jar:?]
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) ~[graylog.jar:?]
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) ~[graylog.jar:?]
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[graylog.jar:?]
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) ~[graylog.jar:?]
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) ~[graylog.jar:?]
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[graylog.jar:?]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[graylog.jar:?]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) ~[graylog.jar:?]
	at io.searchbox.client.http.JestHttpClient.executeRequest(JestHttpClient.java:151) ~[graylog.jar:?]
	at io.searchbox.client.http.JestHttpClient.execute(JestHttpClient.java:77) ~[graylog.jar:?]
	at org.graylog2.indexer.cluster.jest.JestUtils.execute(JestUtils.java:46) ~[graylog.jar:?]
	... 11 more
2018-09-20T19:19:18.844Z WARN  [StreamRouterEngine] Unexpected error during stream matching
java.util.concurrent.TimeoutException: null
	at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:1.8.0_141]
	at com.google.common.util.concurrent.SimpleTimeLimiter.callWithTimeout(SimpleTimeLimiter.java:147) ~[graylog.jar:?]
	at org.graylog2.streams.StreamRouterEngine$Rule.matchWithTimeOut(StreamRouterEngine.java:303) [graylog.jar:?]
	at org.graylog2.streams.StreamRouterEngine$Rule.access$000(StreamRouterEngine.java:264) [graylog.jar:?]
	at org.graylog2.streams.StreamRouterEngine.match(StreamRouterEngine.java:191) [graylog.jar:?]
	at org.graylog2.streams.StreamRouter.route(StreamRouter.java:99) [graylog.jar:?]
	at org.graylog2.filters.StreamMatcherFilter.filter(StreamMatcherFilter.java:42) [graylog.jar:?]
	at org.graylog2.messageprocessors.MessageFilterChainProcessor.process(MessageFilterChainProcessor.java:100) [graylog.jar:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.handleMessage(ProcessBufferProcessor.java:114) [graylog.jar:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.dispatchMessage(ProcessBufferProcessor.java:100) [graylog.jar:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:77) [graylog.jar:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [graylog.jar:?]
	at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
	at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
2018-09-20T19:19:18.845Z WARN  [StreamRouterEngine] Unexpected error during stream matching
java.util.concurrent.TimeoutException: null
	at java.util.concurrent.FutureTask.get(FutureTask.java:205) ~[?:1.8.0_141]
	at com.google.common.util.concurrent.SimpleTimeLimiter.callWithTimeout(SimpleTimeLimiter.java:147) ~[graylog.jar:?]
	at org.graylog2.streams.StreamRouterEngine$Rule.matchWithTimeOut(StreamRouterEngine.java:303) [graylog.jar:?]
	at org.graylog2.streams.StreamRouterEngine$Rule.access$000(StreamRouterEngine.java:264) [graylog.jar:?]
	at org.graylog2.streams.StreamRouterEngine.match(StreamRouterEngine.java:191) [graylog.jar:?]
	at org.graylog2.streams.StreamRouter.route(StreamRouter.java:99) [graylog.jar:?]
	at org.graylog2.filters.StreamMatcherFilter.filter(StreamMatcherFilter.java:42) [graylog.jar:?]
	at org.graylog2.messageprocessors.MessageFilterChainProcessor.process(MessageFilterChainProcessor.java:100) [graylog.jar:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.handleMessage(ProcessBufferProcessor.java:114) [graylog.jar:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.dispatchMessage(ProcessBufferProcessor.java:100) [graylog.jar:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:77) [graylog.jar:?]
	at org.graylog2.shared.buffers.processors.ProcessBufferProcessor.onEvent(ProcessBufferProcessor.java:42) [graylog.jar:?]
	at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) [graylog.jar:?]
	at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66) [graylog.jar:?]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]

macko003 · September 21, 2018, 9:38am

@Tafsir_Alam
Check the heap usage in %, over 80% could be problems, but you can check the elastic recommendations.
In logs are stream process timeout, so check or temp. disable your stream rules. When I used regexp based stream rules the GL disabled the stream because it can’t hanle it in time.

What do you see when you click on the node name under system/nodes?
What buffer full in your server?

If you try to play with parameters what @markuchi suggested, double check the documentation, and change it if you sure what are you doing.
You found these parameters in GL’s server.conf
If you increase the output_backh_size, you need increase the elastic bulk_max_body_size too.

Zerobot · September 21, 2018, 9:51am

What is your infrastructure? Are you using dedicated Kafka? To me it is strange that one of your nodes is processing 6000 messages while 2 other nodes are doing nothing - that’s not how it should look like.

I had quite the same situation with 3 Graylog nodes having 1 node process 90% of messages and 2 other nodes doing almost nothing. Souluton was quite simle: I had 3 Kafka’s, in Graylog => System => Inputs I set 3 Threads for an input but that was a big mistake. For 3 Kafkas (3 replication 3 partition) and 3 Graylogs I had to set 1 Thead per Input and it works like a charm, every node doing around 33% of work

macko003 · September 21, 2018, 10:49am

I installed a load balancer as frontend, so it handle the almost same message number for my servers.

Zerobot · September 21, 2018, 10:58am

Try setting Processor Thread = 1 for every input in SYSTEM => INPUTS , More actions => Edit Input.
Also, restarting Graylogs.

Tafsir_Alam · September 21, 2018, 6:43pm

@amitshar04 ^^ Hope the above messages will help us to solve backlog problem.

system · October 5, 2018, 6:43pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Backlog on GL Nodes Graylog Central (peer support) pipeline-rules , route-to-streampl	7	1160	July 4, 2019
Championing Graylog and need performance advice Graylog Central (peer support)	10	4149	September 14, 2017
Process Buffer filling very fast during peak hours Graylog Central (peer support)	19	9742	March 30, 2019
Buffer utilization is 100% for all nodes having backlog Graylog Central (peer support)	18	5307	October 18, 2018
Graylog stopps processing messages. Only coming in but no out Graylog Central (peer support)	6	11121	November 12, 2018

Incoming is good but outgoing is very slow

Related topics