Graylog 6.1 datanode certificate provisioning renders system dead

1. Describe your incident:
When I try to provide certificates for server/datanode communication the system dies.

2. Describe your environment:

  • OS Information:
    PRETTY_NAME=“Debian GNU/Linux 12 (bookworm)”
    NAME=“Debian GNU/Linux”
    VERSION_ID=“12”
    VERSION=“12 (bookworm)”
    VERSION_CODENAME=bookworm
    ID=debian

  • Package Version:
    opensearch/stable,now 2.15.0 amd64 [installed,upgradable to: 2.19.0]

mongodb-database-tools/bookworm/mongodb-org/7.0,now 100.11.0 amd64 [installed,automatic]
mongodb-mongosh/bookworm/mongodb-org/7.0,now 2.3.9 amd64 [installed,automatic]
mongodb-org-database-tools-extra/bookworm/mongodb-org/7.0,now 7.0.16 amd64 [installed,automatic]
mongodb-org-database/bookworm/mongodb-org/7.0,now 7.0.16 amd64 [installed,automatic]
mongodb-org-mongos/bookworm/mongodb-org/7.0,now 7.0.16 amd64 [installed,automatic]
mongodb-org-server/bookworm/mongodb-org/7.0,now 7.0.16 amd64 [installed,automatic]
mongodb-org-shell/bookworm/mongodb-org/7.0,now 7.0.16 amd64 [installed,automatic]
mongodb-org-tools/bookworm/mongodb-org/7.0,now 7.0.16 amd64 [installed,automatic]
mongodb-org/bookworm/mongodb-org/7.0,now 7.0.16 amd64 [installed]

graylog-6.1-repository/stable,now 1-1 all [installed]
graylog-datanode/stable,now 6.1.6-1 amd64 [installed]
graylog-server/stable,now 6.1.6-1 amd64 [installed]

  • Service logs, configurations, and environment variables:
    2025-02-17T16:18:58.900+02:00 INFO [OpensearchProcessImpl] [2025-02-17T16:18:58,899][INFO ][o.o.n.Node ] [158.129.51.56] closed
    2025-02-17T16:18:58.942+02:00 WARN [OpensearchProcessImpl] Opensearch process failed
    org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
    at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:355) ~[commons-exec-1.4.0.jar:1.4.0]
    at org.apache.commons.exec.DefaultExecutor.lambda$execute$0(DefaultExecutor.java:269) ~[commons-exec-1.4.0.jar:1.4.0]
    at java.base/java.lang.Thread.run(Unknown Source) [?:?]
    2025-02-17T16:18:58.942+02:00 INFO [OpensearchCommandLineProcess] Process 3553 still alive, waiting for termination. Retry #1
    2025-02-17T16:18:58.943+02:00 INFO [OpensearchCommandLineProcess] Process 3553 successfully terminated.
    2025-02-17T16:18:58.943+02:00 INFO [ClusterNodeStateTracer] Updating cluster node a000cc4d-9e45-4d4b-9a7a-289624a5866a from UNAVAILABLE to UNAVAILABLE (reason: PROCESS_TERMINATED)
    2025-02-17T16:18:58.945+02:00 WARN [OpensearchWatchdog] Process watchdog terminated after too many restart attempts

Configuration is made corresponding to your documentation.

● greylog
State: running
Units: 250 loaded (incl. loaded aliases)
Jobs: 0 queued
Failed: 0 units
Since: Mon 2025-02-17 16:07:31 EET; 1h 23min ago
systemd: 252.33-1~deb12u1
CGroup: /
│ ├─graylog-datanode.service
│ │ └─613 /usr/share/graylog-datanode/jvm/bin/java -Dlog4j.configurationFile=file:///etc/graylog/datanode/log4j2.xml -Xms1g -Xmx1g -XX:+UseG1GC -XX:-OmitStackTraceInFastThrow -XX:+UnlockExperimentalVMOptions -Djdk.tls.acknowledgeCloseNotify=true -jar /usr/share/graylog-datanode/graylog-datanode.jar datanode -f /etc/graylog/datanode/datanode.conf -np
│ ├─graylog-server.service
│ │ ├─614 /bin/sh /usr/share/graylog-server/bin/graylog-server
│ │ └─626 /usr/share/graylog-server/jvm/bin/java -Xms1g -Xmx1g -server -XX:+UseG1GC -XX:-OmitStackTraceInFastThrow -Djdk.tls.acknowledgeCloseNotify=true -Dlog4j2.formatMsgNoLookups=true -jar -Dlog4j.configurationFile=file:///etc/graylog/server/log4j2.xml -Dgraylog2.installation_source=deb /usr/share/graylog-server/graylog.jar server -f /etc/graylog/server/server.conf -np
│ ├─mongod.service
│ │ └─615 /usr/bin/mongod --config /etc/mongod.conf
│ ├─opensearch.service
│ │ └─616 /usr/share/opensearch/jdk/bin/java -Xshare:auto -Dopensearch.networkaddress.cache.ttl=60 -Dopensearch.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -XX:+ShowCodeDetailsInExceptionMessages -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.security.manager=allow -Djava.locale.providers=SPI,COMPAT -Xms1g -Xmx1g -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 -Djava.io.tmpdir=/tmp/opensearch-11449338101486022245 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/lib/opensearch -XX:ErrorFile=/var/log/opensearch/hs_err_pid%p.log “-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/opensearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m” -Djava.security.manager=allow --add-modules=jdk.incubator.vector -Djava.util.concurrent.ForkJoinPool.common.threadFactory=org.opensearch.secure_sm.SecuredForkJoinWorkerThreadFactory -Dclk.tck=100 -Djdk.attach.allowAttachSelf=true -Djava.security.policy=file:///etc/opensearch/opensearch-performance-analyzer/opensearch_security.policy --add-opens=jdk.attach/sun.tools.attach=ALL-UNNAMED -XX:MaxDirectMemorySize=536870912 -Dopensearch.path.home=/usr/share/opensearch -Dopensearch.path.conf=/etc/opensearch -Dopensearch.distribution.type=deb -Dopensearch.bundled_jdk=true -cp “/usr/share/opensearch/lib/*” org.opensearch.bootstrap.OpenSearch -p /var/run/opensearch/opensearch.pid --quiet

3. What steps have you already taken to try and solve the problem?
You do not offer any steps in the documentation in case of failure.

4. How can the community help?
Is there at least one person on this planet who knows how to install and make work this software?

Hey @vadim.max,

Could you clarify what you mean by providing certificates, are you using your own CA/Intermediary instead of self signed when running through the preflight setup?

Self signed. And this release is a paperweight:

graylog-datanode/stable,now 6.1.7-1 amd64 [installed]
graylog-server/stable,now 6.1.7-1 amd64 [installed]

It destroys itself at attempt to provision certificates. Perhaps, they don’t even test the most basic functionality before releases.

I cannot even paste or attach a log (too big, unsupported file type).

This is forever (buttons grayed out, process never ends):

Hi, you have an additional OpenSearch package installed on the same machine that’s supposed to hold the DataNode? If so, please remove OpenSearch and start over.

Hi,

No, I do not. This might be leftovers from multiple install/remove operations. But I have done DOZENS of installations from the scratch on an empty Debian. The result is identical: no go.

Finally, I have spit on a datanode as it is totally nonfunctional.

The manufacturers either NEVER test their products before release or have made datanode intentionally corrupt in order to force people to purchase Enterprise support (exactly this response I have got when asked them for support).

Kind regards, Vadim.

I followed the official debian install guide step by step on a fresh debian 12 and it worked as expected. Including the preflight and provisioning certificates. Is this what you try to achieve? Or at what point do you differ from the steps in the docs?

Watch this: https://youtu.be/JnpLpYCcRSs?si=Ft5bu4yAmVn7fH83.

I watched the video, not all logs I’d like to have seen were visible (scrolled too fast for the framerate). Even though you insist on it not being the case: the only situations I saw this exact behaviour is, when there is a software (probably, in this case, an already running OpenSearch you installed) occupying the ports the DataNode needs. (9200 and 9300). So, before installing the DataNode, please check (using netstat for example) that no other software uses these ports.

Hey @vadim.max,
If you give us the full stacktrace from the picture below (taken from the yt video), we can tell you what’s wrong with your setup.

Best regards,
Tomas

I am sorry, may be I don’t understand something. Why default installation creates a port conflict? Is it intended by design? And how to correct that?

The default installation does not create a port conflict. We suspect that you have something already running that creates the port conflict. It would be great if we could get to the bottom of this, if it’s only a misunderstanding and not even a technical problem, we could improve our docs.

What worked for me: debian 12, tasksel with the default (was gnome and another package that I don’t remember) and then only install mongo, graylog-server and graylog-datanode per our docs/install instructions.
Do not install any opensearch package.

I have done installation some 50 times. Every time without a single exception initial installation procedure (when you create/import certificates to secure server-datanode communication) fails.

This is my installation command list:

 0. cd ~
 1. sudo timedatectl set-timezone Europe/Vilnius
 2. sudo apt update
 3. sudo apt upgrade
 4. sudo apt-get -y install gnupg curl pwgen
 5. wget -qO - https://www.mongodb.org/static/pgp/server-7.0.asc | sudo gpg --dearmor -o /usr/share/keyrings/mongodb-server-7.0.gpg
 6. echo "deb [signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg] https://repo.mongodb.org/apt/debian bookworm/mongodb-org/7.0 main" | sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list
 7. sudo apt update
 8. sudo apt install -y mongodb-org
 9. sudo systemctl daemon-reload
10. sudo systemctl enable mongod
11. sudo systemctl start mongod
12. sudo apt-mark hold mongodb-org
13. sudo sed -i '$avm.max_map_count = 262144' /etc/sysctl.conf
14. wget -qO - https://artifacts.opensearch.org/publickeys/opensearch.pgp | sudo gpg --dearmor -o /usr/share/keyrings/opensearch-keyring.gpg
15. echo "deb [signed-by=/usr/share/keyrings/opensearch-keyring.gpg] https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/apt stable main" | sudo tee /etc/apt/sources.list.d/opensearch-2.x.list
16. sudo apt update
17. sudo env OPENSEARCH_INITIAL_ADMIN_PASSWORD=<password> apt install -y opensearch=2.15.*
18. sudo sed -i '/cluster.name:/c\cluster.name: graylog' /etc/opensearch/opensearch.yml
19. sudo sed -i '/node.name:/c\node.name: ${HOSTNAME}' /etc/opensearch/opensearch.yml
20. sudo sed -i '/path.data:/c\path.data: /var/lib/opensearch' /etc/opensearch/opensearch.yml
21. sudo sed -i '/path.logs:/c\path.logs: /var/log/opensearch' /etc/opensearch/opensearch.yml
22. sudo sed -i '/discovery.seed_hosts:/i\discovery.type: single-node' /etc/opensearch/opensearch.yml
23. sudo sed -i '/network.host:/c\network.host: 0.0.0.0' /etc/opensearch/opensearch.yml
24. sudo sed -i '/action.destructive_requires_name:/a\action.auto_create_index: false\nplugins.security.disabled: true' /etc/opensearch/opensearch.yml
25. sudo systemctl daemon-reload
26. sudo systemctl enable opensearch
27. sudo systemctl start opensearch
28. sudo apt-mark hold opensearch
29. wget -qO - https://packages.graylog2.org/repo/packages/graylog-6.1-repository_latest.gpg.key | sudo apt-key add -
30. wget https://packages.graylog2.org/repo/packages/graylog-6.1-repository_latest.deb
31. sudo dpkg -i graylog-6.1-repository_latest.deb
32. sudo apt update
33. sudo apt install -y graylog-datanode
34. sudo sed -i '/password_secret =/c\password_secret = '"$(pwgen -N 1 -s 96)" /etc/graylog/datanode/datanode.conf
35. sudo systemctl daemon-reload
36. sudo systemctl enable graylog-datanode
37. sudo systemctl start graylog-datanode
38. sudo apt install -y graylog-server
39. sudo sed -i '/password_secret =/c\password_secret = '"$(sed -n 's/.*password_secret = \(.*\)/\1/p' /etc/graylog/datanode/datanode.conf)" /etc/graylog/server/server.conf
40. sudo sed -i '/root_password_sha2 =/c\root_password_sha2 = '"$(echo <admin password> | sha256sum | cut -d ' ' -f1)" /etc/graylog/server/server.conf
41. sudo sed -i '/root_timezone =/c\root_timezone = Europe/Vilnius' /etc/graylog/server/server.conf
42. sudo sed -i '/http_bind_address =/c\http_bind_address = '"$(ip a | grep global | awk '{print $2}' | cut -d'/' -f1)"':9000' /etc/graylog/server/server.conf
43. sudo systemctl daemon-reload
44. sudo systemctl enable graylog-server
45. sudo systemctl start graylog-server
46. tail -f /var/log/graylog-server/server.log

What is wrong here?

I have done this sequence so many times that at some point command #29 started to return zero length key…

You are indeed installing opensearch on line 17 - something that we have mentioned several times in this conversation and explicitly advised against.

When you install additional opensearch, this will automatically start, block for itself the 9200 and 9300 ports.

Our data node is running an opensearch instance for you, on the very same ports. This is the problem that kills your data node. You would see that in the stack trace we asked you to give us.

You won’t see any error during installation, because this is a runtime problem.

Please, stop the opensearch. Even better, remove it completely. You don’t need your own with the datanode, it’s aready included there.

Or, if you insist to install it, for whatever reason, change the ports on any side, either in the opensearch or in the data node.

For data node, the configuration properties in the datanode.conf are opensearch_http_port and opensearch_transport_port. Configure them to something different than 9200 and 9300.

Best regards,
Tomas

1 Like

steps 14-28. just don’t do it, please.

1 Like

Do I understand properly that datanode is absolutely not advisable on a computer that runs a server and necessary only for distributed systems?

To be true I already run the GrayLog. I just disabled the datanode service.

  1. the DataNode is a recent development and the replacement for OpenSearch in a Graylog installation. So wherever you installed OpenSearch in the past, you can now choose to install the DataNode instead. (I’d like to defer to the product pages for details)
  2. a production installation should always separate the Graylog node(s), DataNode/OpenSearch node(s) and MongoDB node(s) on separate machines/VMs. Maybe your understanding of what the DataNode is is not correct?
1 Like

Thank you very much. Now I understand what is going on.