Journal metrics unavailable | Failed to call API on node

I have installed a clustered Graylog Server. When I navigate to system/nodes , I encounter the error “Journal metrics unavailable,” specifically on the host that is connected via VPN(IPSec)
OS: AlmaLinux9.3
Opensearch 2.11.1
Mongodb 6.0.12
Graylog 5.2.3

2024-01-19T14:12:53.099+05:00 WARN  [ProxiedResource] Failed to call API on node <58faa302-ac63-48b7-b17e-a383b269aeeb>, cause: timeout (duration: 1002 ms)
2024-01-19T14:13:53.177+05:00 WARN  [ProxiedResource] Failed to call API on node <58faa302-ac63-48b7-b17e-a383b269aeeb>, cause: timeout (duration: 1003 ms)
2024-01-19T14:14:02.013+05:00 WARN  [ProxiedResource] Failed to call API on node <58faa302-ac63-48b7-b17e-a383b269aeeb>, cause: timeout (duration: 1003 ms)
2024-01-19T14:14:03.938+05:00 WARN  [ProxiedResource] Failed to call API on node <58faa302-ac63-48b7-b17e-a383b269aeeb>, cause: timeout (duration: 1002 ms)
2024-01-19T14:14:05.942+05:00 WARN  [ProxiedResource] Failed to call API on node <58faa302-ac63-48b7-b17e-a383b269aeeb>, cause: timeout (duration: 1003 ms)

server.conf

is_leader = true
node_id_file = /etc/graylog/server/node-id
password_secret = AZOVpaYVd7buL03eroKKukIIW5a0w2Xe3VNoWs6qT5Ct3y3NIET0JLLxMhi491dLYHpUFeC2pC9uuyOBsGtzjY0QUN8VAMBy
root_username = admin
root_password_sha2 = 0bfcbb6b871d48cdc7ad695a103fde5356fefefd41adf44e861e9fc3788c554e
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = 192.168.60.81:9000
stream_aware_field_types=false
elasticsearch_hosts = https://admin:FASFsfa0514@opensearch01:9200,https://admin:FASFsfa0514@opensearch02:9200,https://admin:FASFsfa0514@opensearch03:9200
disabled_retention_strategies = none
allow_leading_wildcard_searches = false
allow_highlighting = false
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://graylog:Y06VxwNTaJ1FKLW@192.168.60.81:27017,192.168.62.115:27017,192.168.25.81:27017/graylog?replicaSet=rs01
mongodb_max_connections = 1000
integrations_scripts_dir = /usr/share/graylog-server/scripts

Hey @Yerkin

Trouble shooting:

Execute command for Mongo shell.

mongo# mongo

Then check the replica sets status with the rs query below.

mongo# rs.status()

You should see your cluster status there.

Perhaps try node 1 is_leader=true and the other two is_leader=false.
Double check you configurations, seams like there might be a connection issue with “graylog01ten”

Double check you heap on that node.

Try using cURL on that node from your master node.

curl -i 'http://graylog01ten:9000/api/?pretty=true'

status is normal

[root@graylog01 ~]# mongosh
Current Mongosh Log ID: 65ab59d7bf76711b1e33e8ca
Connecting to:          mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.1.1
Using MongoDB:          6.0.12
Using Mongosh:          2.1.1

For mongosh info see: https://docs.mongodb.com/mongodb-shell/

------
   The server generated these startup warnings when booting
   2024-01-18T16:37:56.853+05:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
   2024-01-18T16:37:56.853+05:00: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never'
------

rs01 [direct: primary] test> rs.status()
{
  set: 'rs01',
  date: ISODate('2024-01-20T05:27:54.955Z'),
  myState: 1,
  term: Long('1'),
  syncSourceHost: '',
  syncSourceId: -1,
  heartbeatIntervalMillis: Long('2000'),
  majorityVoteCount: 2,
  writeMajorityCount: 2,
  votingMembersCount: 3,
  writableVotingMembersCount: 3,
  optimes: {
    lastCommittedOpTime: { ts: Timestamp({ t: 1705728474, i: 12 }), t: Long('1') },
    lastCommittedWallTime: ISODate('2024-01-20T05:27:54.902Z'),
    readConcernMajorityOpTime: { ts: Timestamp({ t: 1705728474, i: 12 }), t: Long('1') },
    appliedOpTime: { ts: Timestamp({ t: 1705728474, i: 12 }), t: Long('1') },
    durableOpTime: { ts: Timestamp({ t: 1705728474, i: 12 }), t: Long('1') },
    lastAppliedWallTime: ISODate('2024-01-20T05:27:54.902Z'),
    lastDurableWallTime: ISODate('2024-01-20T05:27:54.902Z')
  },
  lastStableRecoveryTimestamp: Timestamp({ t: 1705728437, i: 13 }),
  electionCandidateMetrics: {
    lastElectionReason: 'electionTimeout',
    lastElectionDate: ISODate('2024-01-18T11:39:01.244Z'),
    electionTerm: Long('1'),
    lastCommittedOpTimeAtElection: { ts: Timestamp({ t: 1705577940, i: 1 }), t: Long('-1') },
    lastSeenOpTimeAtElection: { ts: Timestamp({ t: 1705577940, i: 1 }), t: Long('-1') },
    numVotesNeeded: 1,
    priorityAtElection: 1,
    electionTimeoutMillis: Long('10000'),
    newTermStartDate: ISODate('2024-01-18T11:39:01.540Z'),
    wMajorityWriteAvailabilityDate: ISODate('2024-01-18T11:39:01.721Z')
  },
  members: [
    {
      _id: 0,
      name: '192.168.60.81:27017',
      health: 1,
      state: 1,
      stateStr: 'PRIMARY',
      uptime: 150600,
      optime: { ts: Timestamp({ t: 1705728474, i: 12 }), t: Long('1') },
      optimeDate: ISODate('2024-01-20T05:27:54.000Z'),
      lastAppliedWallTime: ISODate('2024-01-20T05:27:54.902Z'),
      lastDurableWallTime: ISODate('2024-01-20T05:27:54.902Z'),
      syncSourceHost: '',
      syncSourceId: -1,
      infoMessage: '',
      electionTime: Timestamp({ t: 1705577941, i: 1 }),
      electionDate: ISODate('2024-01-18T11:39:01.000Z'),
      configVersion: 14,
      configTerm: 1,
      self: true,
      lastHeartbeatMessage: ''
    },
    {
      _id: 3,
      name: '192.168.62.115:27017',
      health: 1,
      state: 2,
      stateStr: 'SECONDARY',
      uptime: 132865,
      optime: { ts: Timestamp({ t: 1705728474, i: 4 }), t: Long('1') },
      optimeDurable: { ts: Timestamp({ t: 1705728474, i: 4 }), t: Long('1') },
      optimeDate: ISODate('2024-01-20T05:27:54.000Z'),
      optimeDurableDate: ISODate('2024-01-20T05:27:54.000Z'),
      lastAppliedWallTime: ISODate('2024-01-20T05:27:54.902Z'),
      lastDurableWallTime: ISODate('2024-01-20T05:27:54.902Z'),
      lastHeartbeat: ISODate('2024-01-20T05:27:54.259Z'),
      lastHeartbeatRecv: ISODate('2024-01-20T05:27:54.238Z'),
      pingMs: Long('1'),
      lastHeartbeatMessage: '',
      syncSourceHost: '192.168.60.81:27017',
      syncSourceId: 0,
      infoMessage: '',
      configVersion: 14,
      configTerm: 1
    },
    {
      _id: 4,
      name: '192.168.25.81:27017',
      health: 1,
      state: 2,
      stateStr: 'SECONDARY',
      uptime: 91091,
      optime: { ts: Timestamp({ t: 1705728474, i: 8 }), t: Long('1') },
      optimeDurable: { ts: Timestamp({ t: 1705728474, i: 8 }), t: Long('1') },
      optimeDate: ISODate('2024-01-20T05:27:54.000Z'),
      optimeDurableDate: ISODate('2024-01-20T05:27:54.000Z'),
      lastAppliedWallTime: ISODate('2024-01-20T05:27:54.832Z'),
      lastDurableWallTime: ISODate('2024-01-20T05:27:54.393Z'),
      lastHeartbeat: ISODate('2024-01-20T05:27:54.885Z'),
      lastHeartbeatRecv: ISODate('2024-01-20T05:27:54.307Z'),
      pingMs: Long('84'),
      lastHeartbeatMessage: '',
      syncSourceHost: '192.168.60.81:27017',
      syncSourceId: 0,
      infoMessage: '',
      configVersion: 14,
      configTerm: 1
    }
  ],
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1705728474, i: 12 }),
    signature: {
      hash: Binary.createFromBase64('AAAAAAAAAAAAAAAAAAAAAAAAAAA=', 0),
      keyId: Long('0')
    }
  },
  operationTime: Timestamp({ t: 1705728474, i: 12 })
}

I configured opensearch with TLS

cluster.name: graylog
node.name: opensearch01kar
path.data: /data/opensearch/db
path.logs: /data/opensearch/log
network.host: 0.0.0.0
discovery.seed_hosts: ["192.168.25.82", "192.168.60.80", "192.168.70.80"]
cluster.initial_cluster_manager_nodes: ["192.168.60.80"]

root@opensearch01kar:~# curl -XGET https://opensearch01kar.sicim.eu:9200/_cluster/health -u "admin:FASFsfa0514"
{"cluster_name":"graylog","status":"green","timed_out":false,"number_of_nodes":3,"number_of_data_nodes":3,"discovered_master":true,"discovered_cluster_manager":true,"active_primary_shards":9,"active_shards":18,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}
[root@graylog01kar ~]# curl -i 'http://graylog01ten:9000/api/?pretty=true'
HTTP/1.1 200 OK
X-Graylog-Node-ID: 58faa302-ac63-48b7-b17e-a383b269aeeb
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Runtime-Microseconds: 85834
Content-Type: application/json
Content-Length: 253

{
  "cluster_id" : "484b3b54-6392-4e97-8643-246225bd34d1",
  "node_id" : "58faa302-ac63-48b7-b17e-a383b269aeeb",
  "version" : "5.2.3+9aee303",
  "tagline" : "Manage your logs in the dark and have lasers going and make it look like you're from space!"

I see you have nodes in different networks

name: '192.168.60.81:27017',stateStr: 'PRIMARY',
name: '192.168.62.115:27017',SECONDARY',
name: '192.168.25.81:27017',     stateStr: 'SECONDARY',

Im assuming you have a proxy in front?

EDIT: if you using FQDN , then for good measure make sure there listed in /etc/hosts file.

On node graylog01ten all service are running?

cat /etc/hosts
192.168.60.81   graylog01kar
192.168.62.115  graylog03kar
192.168.25.81   graylog01ten

192.168.60.81(graylog01kar) and 192.168.62.115(graylog03kar) are on different VLANs, but they are in the same location. They can see each other, but they cannot see 192.168.25.81(graylog01ten) specifically.
192.168.25.81 is in another location connected via site-to-site VPN (IPSec) and there are no issues; all metrics are visible.

what services, how can I check?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.