Graylog-datanode migration error

Hello,

I have Graylog 7.0.0-10, OS Ubuntu 24.04, Graylog-datanode 7.0.0-10, Opensearch 2.19.3 - the same as in the Graylog-datanode dist folder.

I do the migration according to the instructions:

When I run migration it shows the error during processing messages:

Error There was an error fetching a resource: Internal Server Error. Additional information: Failed to trigger datanode request. Code: 401, message: Unauthorized

datanode.log:

2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:796) [netty-transport-4.1.121.Final.jar:4.1.121.Final]
2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:697) [netty-transport-4.1.121.Final.jar:4.1.121.Final]
2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:660) [netty-transport-4.1.121.Final.jar:4.1.121.Final]
2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.121.Final.jar:4.1.121.Final]
2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998) [netty-common-4.1.121.Final.jar:4.1.121.Final]
2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.121.Final.jar:4.1.121.Final]
2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] Caused by: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 474554202f5f636c75737465722f6865616c74683f6d61737465725f74696d656f75743d363073266c6576656c3d636c75737465722674696d656f75743d363073266c6f63616c3d7472756520485454502f312e310d0a436f6e74656e742d4c656e6774683a20300d0a486f73743a203132372e302e302e313a393230300d0a436f6e6e656374696f6e3a204b6565702d416c6976650d0a557365722d4167656e743a204170616368652d487474704173796e63436c69656e742f342e312e3520284a6176612f32312e302e38290d0a0d0a
2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1353) ~[netty-handler-4.1.121.Final.jar:4.1.121.Final]
2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1428) ~[netty-handler-4.1.121.Final.jar:4.1.121.Final]
2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530) ~[netty-codec-4.1.121.Final.jar:4.1.121.Final]
2025-11-17T06:57:49.770-08:00 INFO [OpensearchProcessImpl] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469) ~[netty-codec-4.1.121.Final.jar:4.1.121.Final]

Any ideas?

Hi @Rexxer ,

The error message io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record suggests that your setup has some troubles deciding if it should communicate with (internal) opensearch via http or https.

Have you provisioned certificates during the migration? And if yes, with self-signed or your own uploaded CA? Any chance that you have disabled security somewhere in config?

Which step of the migration is it? Can you post a screenshot?

Thanks!

I used self-signed CA. It’s the step after a journal_max_size warning.

The graylog-server can’t connect to opensearch now: nor old one nor new one(datanode).

I played with enable-disable security with no success - the new opensearch config is the same after restart - graylog-datanode recreates it every time.

I tried to change opensearch connection from http to https and there was another error.

You are right that the opensearch configuration is regenerated during every startup. If you want to change anything there, you have to do it either through the datanode.conf options or, if there is no other way, through Data Node Configuration Overrides

Now, let’s figure out why your datanode is trying to connect to an unencrypted opensearch. Could you post your datanode.conf and graylog.cong here (with redacted sensitive information)?

What’s the content of your datanodes collection in mongodb?

I’ve restored the VM from a snapshot and started again but got the same behavior.

server.log:

2025-11-18T03:59:32.641-08:00 INFO [CaKeystore] Signing certificate for node f58605ee-d61b-4277-9fce-ac5629dd5889, subject: CN=Graylog-1-64
2025-11-18T04:00:40.204-08:00 INFO [MigrationActionsImpl] Attempting to pause processing on all nodes…
2025-11-18T04:00:40.208-08:00 INFO [ClusterProcessingControl] Attempting to call ‘pause processing’ on node [b876b429-ea69-4002-be43-5bcfc461e695].
2025-11-18T04:00:40.213-08:00 INFO [SystemProcessingResource] Paused message processing - triggered by REST call.
2025-11-18T04:00:40.214-08:00 INFO [ClusterProcessingControl] Successfully called ‘pause processing’ on node [b876b429-ea69-4002-be43-5bcfc461e695].
2025-11-18T04:00:40.214-08:00 INFO [MigrationActionsImpl] Done pausing processing on all nodes.
2025-11-18T04:00:40.214-08:00 INFO [MigrationActionsImpl] Waiting for output buffer to drain on all nodes…
2025-11-18T04:00:40.216-08:00 INFO [ClusterProcessingControl] Attempting to call ‘fetching output rate metric value’ on node [b876b429-ea69-4002-be43-5bcfc461e695].
2025-11-18T04:00:40.221-08:00 INFO [ClusterProcessingControl] Successfully called ‘fetching output rate metric value’ on node [b876b429-ea69-4002-be43-5bcfc461e695].
2025-11-18T04:00:40.222-08:00 INFO [ClusterProcessingControl] Output rate has not yet reached zero on nodes [[b876b429-ea69-4002-be43-5bcfc461e695]].
2025-11-18T04:00:42.224-08:00 INFO [ClusterProcessingControl] Attempting to call ‘fetching output rate metric value’ on node [b876b429-ea69-4002-be43-5bcfc461e695].
2025-11-18T04:00:42.227-08:00 INFO [ClusterProcessingControl] Successfully called ‘fetching output rate metric value’ on node [b876b429-ea69-4002-be43-5bcfc461e695].
2025-11-18T04:00:42.227-08:00 INFO [ClusterProcessingControl] Output buffer is now empty on all nodes.
2025-11-18T04:00:42.227-08:00 INFO [ClusterProcessingControl] Checking again for empty output buffers (attempt #2).
2025-11-18T04:00:42.228-08:00 INFO [MigrationActionsImpl] Done waiting for output buffer to drain on all nodes.
2025-11-18T04:00:42.338-08:00 ERROR [AnyExceptionClassMapper] Unhandled exception in REST resource
java.lang.IllegalStateException: Failed to trigger datanode request. Code: 401, message: Unauthorized
at org.graylog2.rest.resources.datanodes.DatanodeRestApiProxy.lambda$remoteInterface$9(DatanodeRestApiProxy.java:172)
at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Unknown Source)
at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(Unknown Source)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(Unknown Source)
at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.copyInto(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(Unknown Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(Unknown Source)
at java.base/java.util.stream.AbstractTask.compute(Unknown Source)
at java.base/java.util.concurrent.CountedCompleter.exec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source)
at java.base/java.util.concurrent.ForkJoinTask.invoke(Unknown Source)
at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(Unknown Source)
at java.base/java.util.stream.AbstractPipeline.evaluate(Unknown Source)
at java.base/java.util.stream.ReferencePipeline.collect(Unknown Source)
at org.graylog2.rest.resources.datanodes.DatanodeRestApiProxy.remoteInterface(DatanodeRestApiProxy.java:157)
at org.graylog.plugins.views.storage.migration.state.actions.MigrationActionsImpl.isOldClusterStopped(MigrationActionsImpl.java:124)
at com.github.oxo42.stateless4j.triggers.TriggerBehaviour.isGuardConditionMet(TriggerBehaviour.java:31)
at com.github.oxo42.stateless4j.StateRepresentation.getPermittedTriggers(StateRepresentation.java:179)
at com.github.oxo42.stateless4j.StateMachine.getPermittedTriggers(StateMachine.java:128)
at org.graylog.plugins.views.storage.migration.state.machine.MigrationStateMachineImpl.nextSteps(MigrationStateMachineImpl.java:79)
at org.graylog.plugins.views.storage.migration.state.machine.MigrationStateMachineImpl.trigger(MigrationStateMachineImpl.java:59)
at org.graylog.plugins.views.storage.migration.state.rest.MigrationStateResource.trigger(MigrationStateResource.java:67)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:274)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:266)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:253)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:696)

server.conf:

is_leader = true
node_id_file = /etc/graylog/server/node-id
password_secret = edited
root_password_sha2 = edited
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = 0.0.0.0:9000
stream_aware_field_types=false
elasticsearch_hosts = http://127.0.0.1:9200
disabled_retention_strategies = none,close
allow_leading_wildcard_searches = false
allow_highlighting = false
field_value_suggestion_mode = on
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
message_journal_max_age = 12h
message_journal_max_size = 10gb
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://localhost/graylog
mongodb_max_connections = 1000
enabled_tls_protocols = TLSv1.3
integrations_scripts_dir = /usr/share/graylog-server/scripts

datanode.conf:

node_id_file = /etc/graylog/datanode/node-id
config_location = /etc/graylog/datanode
password_secret = edited
root_password_sha2 = edited
mongodb_uri = mongodb://localhost/graylog
opensearch_http_port = 9220
opensearch_transport_port = 9330
opensearch_location = /usr/share/graylog-datanode/dist
opensearch_config_location = /var/lib/graylog-datanode/opensearch/config
opensearch_logs_location = /var/log/graylog-datanode/opensearch
opensearch_data_location = /var/lib/opensearch

I’ve changed the ports this time.

opensearch_data_location is where the old opensearch data located.

mongodb:

graylog> db.datanodes.find({})
[
{
_id: ObjectId(‘691c5e2ada31e14cb19967e5’),
node_id: ‘f58605ee-d61b-4277-9fce-ac5629dd5889’,
datanode_status: ‘PREPARED’,
hostname: ‘Graylog-1-64’,
is_leader: false,
last_seen: Timestamp({ t: 1763468397, i: 1 }),
transport_address: ‘https://Graylog-1-64:9220’,
cluster_address: ‘Graylog-1-64:9330’,
configuration_warnings: ,
datanode_version: ‘7.0.0+a788678’,
opensearch_roles: [ ‘cluster_manager’, ‘data’, ‘ingest’, ‘remote_cluster_client’ ],
rest_api_address: ‘https://Graylog-1-64:8999’,
action_queue: null,
cert_valid_until: ISODate(‘2035-11-16T11:59:32.000Z’)
}
]

graylog> db.nodes.find({})
[
{
_id: ObjectId(‘645ba9883daeaa1430ee2981’),
hostname: ‘Graylog-1-64’,
last_seen: Timestamp({ t: 1763468306, i: 2 }),
transport_address: ‘http://192.168.1.64:9000/api/’,
type: ‘SERVER’,
is_leader: true,
node_id: ‘b876b429-ea69-4002-be43-5bcfc461e695’
}
]

Thank you! This helps, the error is now different and I’ve seen it before. Now I know what’s happening. There is a bug which has been fixed and will be part of the 7.0.1 release scheduled for this week. Meanwhile, you can use a workaround.

In your graylog.conf, please set

indexer_use_jwt_authentication=true

this will force your graylog server to always add the jwt auth header to every datanode request, fixing the unauthorized error you are getting now.

After you adapted the configuration, please restart your graylog server and try again.

1 Like

It has finished successfully this time.

Thank You very much.

BTW: I managed the previous migration on another host by resetting migration several times, switching between opensearch instances and fixing errors without this conf-string.

1 Like

Happy to help! Thank you for letting me know!

This setting is normally autodetected, based on presence of datanodes. So if you start your server and datanode in correct order and the migration is already in almost-finished state, it will enable the jwt auth and suddenly everything works. That’s why it worked for you in that situation. The bugfix I mentioned is enforcing the jwt auth for all datanode communication, replacing this problematic autodetection.

Best regards,
Tomas