**1. Describe your incident:
**
My setup: 1 graylog headnode + 1 datanode. shards: 1 replicas: 0. I’m trying to add another datanode and retire the old one. Is it possible to do with graylog web GUI without going into API stuff?
If I add another datanode and remove the old one, would I loose data? Or else graylog simply makes sure all the data from old datanode is migrated to new one before actually removing it?
2. Describe your environment:
OS Information: Linux
Package Version: 6.1.8
Service logs, configurations, and environment variables: shards:1 replicas: 0
3. What steps have you already taken to try and solve the problem?
I searched the forum, and googled. Documentation is minimal on datanode and remove action.
This operation will require you to interface with the Opensearch API, you can do this be pulling a client cert from the Graylog UI under cluster config.
Assuming the second Graylog Data Node has been added to the cluster, you could disable routing to the older node with the below.
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip": "OLD_DATA_NODE_IP"
}
}
Once that is done you should be able to see the shards drain off of the old node down to zero.
GET _cat/shards?v
Assuming the indices are comprised of a single shards and without replicas this should be a safe operation. Once the node is drained of shards it can be shutdown.
@Wine_Merchant what’s is your guess in regards to removing a datanode: Does graylog do it gracefully or else in my case, it would lead to a data loss? I guess my question is about what goes on under the hood, when I click ‘Remove’.
Managed to move all shards to new datanode via API by using a new client cert with all_access role. Thanks @Wine_Merchant . I’m still not able to retire the old node though. There are two complications:
turns out Graylog has some internal shards that it uses which’re set to non-zero replicas. In any case, these replicas can be discarded since they have primaries.
Before removing old datanode, I can’t make sure graylog is in a good state since if the node is disabled, graylog throws ‘cluster_manager_not_discovered_exception’. it looks like graylog is working ok but the new datanode can’t take over the datanode cluster master role due to quorum issue I think.
Now that I think about this again, shouldn’t Graylog web UI automatically handle routing when I click “Remove” and if it’s not possible prompt an error to the user.
makes sense. So if quorum is not issue, i.e. if I have 3 datanodes, graylog will automatically handle allocation of shards if I click remove on a datanode?
I ran a quick test, spun up a 3 node cluster and created an index with 3 shards which were equally placed across all three nodes. Upon removal of node1, cluster remains green and node1’s shards are now on node3.
It appears that adding the node1 back to the cluster, shards require a manual rebalance to move them back.