Ip address change - strange behavior

I have a specific problem that is currently not 100% understandable, maybe someone here has another idea.

the enviorment:
graylog open 4.3.12 on ubuntu 18.04 on vmware (12 cores / 16 gb memory)
elasticsearch 7.10.2
mongod 4.0.28

Index Set 28 indices, 352,207,630 documents, 139.6GiB / Rotation period: P1D / max 28 indices / ~10.000 messages per minute / ~100 clients
300 GB Disk Space, ~ 115 GB available

Global Sylog UDP Input with Default settings on port 1514 (iptables nat 514 to 1514, as desiribed in articel 9061) with grok extractoors removed for tests

the situation:
i change the ip of the system (migration to other subnet) via netplan and boot it through (in parallel the ip on the clients is changed)
all services are running and graylog shows no errors
As soon as the input is running ~50 messages per minute arrive from a few systems and the rest of the messages go into the buffer which is also not processed (otherwise the buffer is never full).
As soon as the IP is changed back (and the clients too) everything works again.
If on the clients the ip is adjusted to another lightweight syslog server (not graylog), the messages arrive normally.

the graylog-server.conf:
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret = XXX
root_password_sha2 = XXX
root_timezone = CET
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = 0.0.0.0:9000
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 10
outputbuffer_processors = 7
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://localhost/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
transport_email_enabled = true
transport_email_hostname = XXX
transport_email_port = XX
transport_email_use_auth = false
transport_email_from_email = XXX
proxied_requests_thread_pool_size = 32
graylog_commad_wrapper=“authbind”

There are no related errors in /var/log/graylog-server/server.log and /var/log/elasticsearch/gc.log

Hello @kkhn

I have pretty much the same setup as your self
CentoS 7, 12CPU, 12 GB mem, 500Gb drive. I ingest arounf 30 40 Gb of logs a day.

from what I know about buffers filling up are two things, resources and incorrect configuration either Pipeline or extractors.

As for resources the first thing I noticed was you setting for buffer_processor

processbuffer_processors = 10
outputbuffer_processors = 7
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2

Keep in mind these are threads, the rule is for best performance these should match the amount CPU cores the server.

For example:

processbuffer_processors = 10
outputbuffer_processors = 7
inputbuffer_processors = 2

This show me that you should have 19 CPU cores but you only have actually 12 Cores. So basically your using all the CPU’s for Graylog and not leaving some for your Operating system. This may work for a time but end up having issues later.

The processbuffer_processors is your heavy hitter, this should have the largest amount of processors.

Change Ip address should not be an issue with you buffers filling up.

How about /var/log/elasticsearch/graylog.log

You can try to increase you batch size.

output_batch_size = 5000

And if you have enought Disk space increase you journal size.That will give you a better buffer then 5GB incase something goes wrong.

message_journal_max_size = 12gb

Then adjust you buffers to something like this:

processbuffer_processors = 6
outputbuffer_processors = 3
inputbuffer_processors = 2

After restarting Graylog service, watch Node sections,
Hoep that helps

1 Like

Hello @gsmith ,

thank you for the quick reply.
The process_buffer settings were still from a test with more vCPUs, I have adjusted them so that the total again matches the actual cores.

The adjustment of the message_journal_max_size and the output_batch_size made the buffer not fill up after IP address change.

Thanks a lot!

Hey @kkhn

Glad things worked out for you, if you could mark this as resolved for future searchs that would be great :+1:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.