Inter-node API Timeout in Graylog 6.2.0 (Multi-Region Cluster)

I’m running a new Graylog 6.2.0 cluster with nodes distributed across multiple geographical regions. When I log in to a node in one region, I’m unable to view performance metrics (e.g., Memory/Heap, Buffers, Journal) for nodes in other regions. However, if I log in to a node in the same geographic location, all metrics display correctly.

In the logs, I’m consistently seeing inter-node API timeouts like this:

2025-04-29T18:30:10.070Z WARN  [ProxiedResource] Failed to call API on node <1ce8335f-e3a9-4d66-b1eb-4bdcdedf827b>, cause: timeout (duration: 1002 ms)
2025-04-29T18:30:10.070Z WARN  [ProxiedResource] Failed to call API on node <09ac9ab1-ec2c-4e54-88a0-1c74f0291dfe>, cause: timeout (duration: 1001 ms)
2025-04-29T18:30:10.070Z WARN  [ProxiedResource] Failed to call API on node <a1e3a52c-7d66-4514-afe0-3e4d7afb5e68>, cause: timeout (duration: 1001 ms)
2025-04-29T18:30:10.070Z WARN  [ProxiedResource] Failed to call API on node <1db2c9da-8c1c-4f24-9b6e-b85f49fadd94>, cause: timeout (duration: 1002 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <c81809a0-b020-489f-892c-15211dd73696>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <1ce8335f-e3a9-4d66-b1eb-4bdcdedf827b>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <1db2c9da-8c1c-4f24-9b6e-b85f49fadd94>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <09ac9ab1-ec2c-4e54-88a0-1c74f0291dfe>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <a1e3a52c-7d66-4514-afe0-3e4d7afb5e68>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:10.071Z WARN  [ProxiedResource] Failed to call API on node <7569ace1-6ea9-4f67-8531-da9c0919ee3d>, cause: timeout (duration: 1002 ms)
2025-04-29T18:31:58.277Z WARN  [ProxiedResource] Failed to call API on node <a1e3a52c-7d66-4514-afe0-3e4d7afb5e68>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:58.277Z WARN  [ProxiedResource] Failed to call API on node <c81809a0-b020-489f-892c-15211dd73696>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:58.277Z WARN  [ProxiedResource] Failed to call API on node <1ce8335f-e3a9-4d66-b1eb-4bdcdedf827b>, cause: timeout (duration: 1001 ms)
2025-04-29T18:31:58.277Z WARN  [ProxiedResource] Failed to call API on node <1db2c9da-8c1c-4f24-9b6e-b85f49fadd94>, cause: timeout (duration: 1001 ms)

I’ve tried uncommenting and setting proxied_requests_default_call_timeout = 5s in server.conf, but it doesn’t seem to have any effect—the timeouts still occur at around 1 second. I’ve also reviewed the config file but couldn’t find any other relevant settings to adjust this timeout.

Everything else in the cluster appears to be functioning properly. This issue is specifically with viewing node performance stats across regions.

Environment Details:

  • OS: Ubuntu 24.04
  • Graylog Version: 6.2.0
  • Number of Graylog nodes: 9 (will be scaling to 20+)
  • OpenSearch Version: 2.15.0
  • Number of OpenSearch data nodes: 32

Is there a new or alternative setting in Graylog 6.2.0 to increase the inter-node API call timeout? Any help would be appreciated.

Has anyone run into this issue or have any insights?

I’ve noticed that if you visit a node’s metrics page e.g., https://10.X.X.X:9000/system/metrics/node/<NODE-ID>, it fails to load for any node located in a different geographic region than the one you’re currently accessing. This doesn’t appear to be a firewall or DNS issue—I’ve verified both are working correctly.

It’s also worth noting that this behavior was not present in Graylog 6.1.11. In that version, the cluster page displayed journal and JVM heap metrics for all nodes, which relied on successful metric queries across the cluster. It seems that something changed in Graylog 6.2 that affects this functionality.

If anyone knows of any relevant settings in server.conf that might influence this behavior, I’d really appreciate the help.

Thanks!

I’ve discovered that if I use the API browser, I’m still able to query metrics for any node in the cluster. It looks as if the standard GUI is either executing the queries differently from the API browser, or there is some type of bug.

Does anyone have any ideas?

Below is a screenshot of a working API browser querying metrics from a node in a separate geographic region. I’m unable to see this info from the standard GUI.

From API browser:

From standard GUI (Spins on loading… forever):