Graylog Open 6.1.1 crashes

Hi All,

We have been running Graylog Open for some time now and it has bene stable. We have been running the monthly updates a week or so after release. We have just updated from 6.0.7 to 6.1.1 and hit some issues.

Initially Graylog works fine but after a random amount of time search results hang and give no results. If I check the server.log the following error appears when the results stop working:
WARN [ProxiedResource] Failed to call API on node , cause: Failed to connect to REDACTED/127.0.0.1:9000 (duration: 2 ms)

If I restart the VM (or sometimes just the Graylog service) everything starts working again.

When it’s not working if I curl the URL on the Graylog VM I get connection refused (but externally I get the web interface), when It’s working I get the correct curl results.

Sometimes if i restart the Graylog service the curl results are fine but I can’t access the web page externally.

When I ran the update it did prompt to replace the server.conf and I said no.

Any thoughts?

Ubuntu 22.04.5
Graylog 6.1.1
MongoDB 6.0.19
OpenSearch 2.13.0

Hello @dave2318,

This sounds like there might be some contention of resources occurring, do you have any monitoring of the cluster to observer resource utilisation?

Could you post your server.conf, redacting anything we shouldn’t see.

grep "^[^#;]" server.conf

Hi Thanks for the quick response, redacted server.conf below

is_leader = true
node_id_file = /etc/graylog/server/node-id
password_secret = TheSecret
root_password_sha2 = theSecret
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = FQDN:9000
http_enable_tls = true
http_tls_cert_file = CertPath
http_tls_key_file = CertPath
stream_aware_field_types=false
elasticsearch_hosts = http://127.0.0.1:9200,http://user:password@node2:19200
disabled_retention_strategies = none
allow_leading_wildcard_searches = true
allow_highlighting = false
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://localhost/graylog
mongodb_max_connections = 1000
system_event_excluded_types = SIDECAR_STATUS_UNKNOWN

Resources aren’t overly stresses, CPU and Memory generally <50%.

I’m not going to hold my breath but I think it’s working…

I had added the full FQDN to /etc/hosts pointing to the actual IP address of the server. I’ve not had to do this before as the FQDN is resolved by DNS, so it just works, seems that it’s become more of a requirement to have the hosts entry specified with 6.1 as it doesn’t seem to like DNS so much.

Nice work!

Was there an entry in hosts file for the FQDN against 127.0.0.1, the server seemed is was failing because it was calling the api on 127.0.01 which it got from looking up the FQDN it’s bound to.

There was no complete entry against the FQDN in the hosts file, the only 127.0.0.1 was localhost and the actual host FQDN (which isn’t the FQDN it’s accessed through).
I did initially try adding 127.0.0.1 FQDN to the hosts file but then I couldn’t access the web interface, changing it to IP FQDN got it working fine.

It’s now been nearly 3 hours and it’s still working so it does look like it’s fixed.

2 Likes

A day later and it’s still working. I’m confident now that it’s fixed.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.