Graylog not processing data

flapegue · March 27, 2017, 1:12pm

when i see in system/nodes/details, i can that the number of unprocessed message increase and not decrease, even when i stop all input.

i increase heap memory for graylog and elasticsearch, without solve this problem

elasticsearch 2.4.4
graylog-2.2-repository 1-5
graylog-server 2.2.2-1
mongodb-clients 1:2.4.10-5
mongodb-server 1:2.4.10-5

physique host with Intel® Xeon® CPU E5-1603 0 @ 2.80GHz, 8GB RAM, 1TB DD

/etc/default/graylog-server
JAVA=/usr/bin/java
GRAYLOG_SERVER_JAVA_OPTS="-Xms4g -Xmx4g -XX:NewRatio=1 -server -XX:+ResizeTLAB -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:+CMSClassUnloadingEnabled -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow"
GRAYLOG_SERVER_ARGS=""
GRAYLOG_COMMAND_WRAPPER=""

/etc/default/elasticsearch
ES_HEAP_SIZE=4g
ES_STARTUP_SLEEP_TIME=5

/etc/graylog/server/server.conf
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret = XXXX
root_password_sha2 = XXXX
root_timezone = Europe/Paris
plugin_dir = /usr/share/graylog-server/plugin
rest_listen_uri = http://X.X.X.X:9000/api/
rest_transport_uri = http://X.X.X.X:9000/api/
web_listen_uri = http://X.X.X.X:9000/
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_cluster_name = graylog
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 20
outputbuffer_processors = 15
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://localhost/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
content_packs_dir = /usr/share/graylog-server/contentpacks
content_packs_auto_load = grok-patterns.json
proxied_requests_thread_pool_size = 32

/etc/elasticsearch/elasticsearch.yml
cluster.name: graylog
discovery.zen.minimum_master_nodes: 1

# curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "graylog",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 40,
  "active_shards" : 40,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

jan · March 27, 2017, 3:07pm

you should return outputbuffer_processor and processbuffer_processor back to default and raise the output_batch_size to 100.

Read the first paragraph of the posting and try running defaults before changing the settings randomly in your server.conf.

flapegue · March 27, 2017, 3:27pm

i do this but without change …
output_batch_size = 100
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3

ps. i work with only one node

flapegue · March 27, 2017, 7:18pm

In fact, i realize that some message was processed but the most part are deleted because journal’s rotating and because processing are very very slowly.
How I can do to speed up the calculation ?

jochen · March 28, 2017, 8:33am

Which calculation do you mean?

jan · March 28, 2017, 8:49am

@flapegue

with your current setting

output_batch_size = 100
output_flush_interval = 1
outputbuffer_processors = 3

You are sending 3*100 messages per second to Graylog. Just for the reference the text from the above mentioned Blog Posting:

Next, in your Graylog server.conf, you can increase output_batch_size and adjust outputbuffer_processors to allow larger batches to be sent over fewer processors. The settings for output batch size and output buffer processors will vary depending on your environment. For our environment, we set output_batch_size to 5000 and outputbuffer_processors to 3 with a 31 GB heap memory for the Elasticsearch node.

So why not set your output_batch_size to 2000 and let the journal drain out?

flapegue · March 28, 2017, 12:21pm

I had done the test with several combinations
with output_batch_size = 100 and output_batch_size = 600, to see the impacts, but nothing change
i just try with output_batch_size = 2000 and no change, the number already increase

flapegue · March 28, 2017, 12:54pm

Last month I was in version 2.1 and I did not have this problem.
Could it come from this update ?

flapegue · March 28, 2017, 1:03pm

All settings to suit my configuration.

jan · March 28, 2017, 1:28pm

dear @flapegue

further investigation is very time consuming and would need to dig into your log files. We, from remote would only be able to look into that if you provide log files from elasticsearch an graylog.

additional you would need to give graylog some time to process your journal this can take up to a few hours.

Everything that follows now is just oracle and guessing.

You should look into your graylog logfile and look and read what might be the issue.

flapegue · March 28, 2017, 3:58pm

for the graylog-server log
https://raw.githubusercontent.com/flapegue/export/master/graylog-server.log
and for elasticsearch log
https://raw.githubusercontent.com/flapegue/export/master/elasticsearch.log

thanks

jan · March 29, 2017, 9:57am

Not sure what the issue is within your setup as all information given from your end (until now) look normal.

For me further investigation is not possible, said that if you need help go over to get professional support.

flapegue · March 29, 2017, 3:56pm

I thank you already for the help you have given me.

I will not be able to start from a paid support … but will the solution to erase the elasticsearch indexes will restore the service ?
I use an LVM logical partition to store my indexes, partition that I have increased several times, could the problem come from there ?

jtkarvo · March 30, 2017, 9:24am

hi,

I use your example of output batch size 5000 and number of processors 3, and the graylog nodes are some times limited to 15k messages per second.

Is there somewhere a guide on how big batch sizes are recommended, versus the elasticsearch cluster settings? I.e. can I just double the 5000 to 10000 and be safe, or is there some other parameter that I should consider?

Topic		Replies	Views
Issues with Graylog after moving to an elasticsearch cluster Graylog Central (peer support)	21	2662	June 24, 2018
Unprocessed messages is constantly increasing Graylog Central (peer support)	4	4684	June 24, 2020
Graylog output will stop Graylog Central (peer support)	10	439	September 14, 2023
Problem with unprocessed messages in inputs and index Graylog Central (peer support)	5	668	March 19, 2020
How can we improve output message rate to elasticsearch Graylog Central (peer support)	2	3872	May 27, 2019

Graylog not processing data

Related topics