HTTPS is going to destroy my whole project :(

Hello folks,

i am stumbling through this forum for weeks but just can not figure out what exactly my problem is! Before i start telling about my problem, here is my setup:

Single vServer, Ubuntu 20.04 hosted on my esxi Cluster Version 6.7.
vServer: 8GB RAM, 4 vCPUs, 150GB HDD, 1 NIC
Graylog: Graylog server 4.0.2, Deployment: deb
JRE: Private Build 1.8.0_275 on Linux 5.4.0-65-generic

Let’s assume that it’s ip address is 10.100.100.100 and the hostname is bgraylog.

i have created a certificate with our microsoft ca for this host. Then i imported it to the locations i set in the graylog config file. I also did the JVM thing from the graylog guide to install an HTTPS certificate.
When i go to the https website from graylog, sometimes i just show an error, sometimes i can logon and click some masks before the error appears. I even did it to get LDAP binding working via the webinterface.

The error:
We are experiencing problems connecting to the Graylog server running on https: serveraddress :9000/api/ . Please verify that the server is healthy and working correctly.

You will be automatically redirected to the previous page once we can connect to the server.

In the webinterface my connection is marked as “secure” and i also validated, that hostnames and ip addresses are “alternative names” in the certificate.

I installed graylog by following the ubuntu installation guide from graylog docs, i am not using any proxy.

I set the following options in the server.conf:

http_bind_address = 10.100.100.100:9000
rest_enable_tls = true
rest_enable_cors = true
rest_tls_cert_file = /etc/graylog/server/certificates/bgraylog.cer
rest_tls_key_file = /etc/graylog/server/certificates/bgraylog_unsec.key
http_enable_cors = true
http_enable_tls = true
http_tls_cert_file = /etc/graylog/server/certificates/bgraylog.cer
http_tls_key_file = /etc/graylog/server/certificates/bgraylog_unsec.key
elasticsearch_hosts = http://10.100.100.100:9200

netstat -tulpn is showing the Ports 9000, 9200, 9300 LISTENING on 10.100.100.100
The graylog-serverlogfile is showing, that the server is up and running, all services started and running. The only WARN i get is something like:
GeoIP database file does not exist: /etc/graylog/server/GeoLite2-City.mmdb
but i dont think that this is playing a big role in this error case.

Now my question: What am i missing out? Is there a way to get some info on “where” the server is stumbling and crashing? By not getting any logs to this error i dont know what i am missing. :frowning:

Help is highly appreciated!!
PS: please let me know if i forgot any information! <3

@twakki
Hello,
I was looking at you server.conf.

By chance are you useing the new confgiurtion file for Graylog 4.0?

if possible can you show the whole server.conf file?

Throwing my hat in here. @gsmith is right–the config you provided doesn’t look like it’s a more recent version of graylog. My HTTP-related config bits look like this:

http_bind_address = 192.168.1.2:9000
http_enable_cors = true
http_enable_tls=true
http_tls_cert_file=/etc/graylog/ssl/fullchain.pem
http_tls_key_file=/etc/graylog/ssl/privkey.pem
http_publish_uri=https://logs00.example.com:9000/

Those were enough to run Graylog with TLS.

Hello,

i hope i manage to post it correctly. :smiley:
I hope that it’s okay i threw this into a pastelink. Conf. was too long for this post. :slight_smile:

Pass: eJUdcNAgx3
Conf: Pastebin.com - Locked Paste

I changed my config to the configuration statements you posted.
I am getting the webinterface, can login, but loadings are endless. So i pressed F5 and ended up with my error.

There is a Firewall between my Client and the Server. I checked that my client can reach port 9000 via hostname and via ip address. No problem there.
But the server should reach it’s own API because everything is on one machine, shouldn’t it?

best regards!

/EDIT: your certificate and keyfile got .pem extension. Do i need these? I got .cer and .key. Thanks.

Here’s a long thread of me trying to figure out SSL/TLS. I am not successful… yet. If it helps you, let me know.

Hey and thank you for your reply. I checked this but the permissions on all my certificate files (.cer and .key) are:
-rw-r–r-- 1 root root

so i think this should be fine. :slight_smile: I will investigate your thread anyways.

kind regards

I believe what was suggested is that they should NOT be owned by root, but by the graylog user:

“On my OS the user is called graylog .
Execute ps aux | grep graylog and you will see the user.
Execute ls -l on the private key file to display the unix rights.”

Oh okay.
I changed this so that “graylog” is now the owner.
Same error. :frowning:
And yes: graylog is the user that is running the server service.

Well, bummer. I’ll still give it a go when I have a chance, but am not hopeful. If I have success, I will update. Thank you, Zach.

@twakki
Hello, I took a look at you configuration file you posted. The first thing I noticed was this section “#### REST URI ACONSO MADE”

The configuration file for Graylog Version 4.0 does not show settings for “#### REST URI ACONSO MADE” unless this is something new? I could be wrong.

Take a look here.
https://docs.graylog.org/en/4.0/pages/configuration/https.html#certificate-key-file-format

You have processbuffer_processors = 10 and outputbuffer_processors = 6
Your over all available processors are the number of CPU cores. You have 16 Cores available?

I would go over you configuration file again somethings doesnt seem right.

Hope this helps

Nah, your right. This section is made by me just to mark where i edited something. Seems stupid for others. But made sense to me. :smiley:

I will try out the buffer thing and check your link.
If i got any news i will report back!

Thanks.

I changed the buffer thing to:

processbuffer_processors = 2 and outputbuffer_processors = 2
because i got 4 vCPUs.

Then i tried to log into the webinterface, my error appeared and i got the following in serverlog.
But how to interpret it? :smiley:

INFO [ServiceManagerListener] Services are healthy
2021-02-11T09:13:17.547+01:00 INFO [ServerBootstrap] Graylog server up and running.
2021-02-11T09:13:17.547+01:00 INFO [InputSetupService] Triggering launching persisted inputs, node transitioned from Uninitialized [LB:DEAD] to Running [LB:ALIVE]
2021-02-11T09:13:55.566+01:00 ERROR [ServerRuntime$Responder] An I/O error has occurred while writing a response message entity to the container output stream.
org.glassfish.jersey.server.internal.process.MappableException: java.io.IOException: Write timeout exceeded when trying to flush the data
at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:67) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.MessageBodyFactory.writeTo(MessageBodyFactory.java:1116) ~[graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:638) [graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:371) [graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:361) [graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [graylog.jar:?]
at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [graylog.jar:?]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [graylog.jar:?]
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232) [graylog.jar:?]
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680) [graylog.jar:?]
at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:356) [graylog.jar:?]
at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200) [graylog.jar:?]
at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) [graylog.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: java.io.IOException: Write timeout exceeded when trying to flush the data
at org.glassfish.grizzly.http.io.OutputBuffer.blockAfterWriteIfNeeded(OutputBuffer.java:999) ~[graylog.jar:?]
at org.glassfish.grizzly.http.io.OutputBuffer.write(OutputBuffer.java:701) ~[graylog.jar:?]
at org.glassfish.grizzly.http.io.OutputBuffer.write(OutputBuffer.java:553) ~[graylog.jar:?]
at org.glassfish.grizzly.http.server.NIOOutputStreamImpl.write(NIOOutputStreamImpl.java:51) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:189) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:271) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.ByteArrayProvider.writeTo(ByteArrayProvider.java:73) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.ByteArrayProvider.writeTo(ByteArrayProvider.java:37) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.invokeWriteTo(WriterInterceptorExecutor.java:242) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor$TerminalWriterInterceptor.aroundWriteTo(WriterInterceptorExecutor.java:227) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139) ~[graylog.jar:?]
at org.glassfish.jersey.server.internal.JsonWithPaddingInterceptor.aroundWriteTo(JsonWithPaddingInterceptor.java:85) ~[graylog.jar:?]
at org.glassfish.jersey.message.internal.WriterInterceptorExecutor.proceed(WriterInterceptorExecutor.java:139) ~[graylog.jar:?]
at org.glassfish.jersey.server.internal.MappableExceptionWrapperInterceptor.aroundWriteTo(MappableExceptionWrapperInterceptor.java:61) ~[graylog.jar:?]
… 20 more

I raised outputbuffer_processor_keep_alive_time from default 5000 to 30000.
And what should i say, i can login now! I am trying to login with more than 1 user with my colleague and report back. That was what crashed the application the last times we made it this far! :wink: :smiley:

cya and huge thanks so far!

/EDIT: short time after i made these posts, the server crashed again without any logs. :frowning:

Hello again, i spent the whole day testing things out.
I got the webinterface working, but can crash it by pressing ctrl+f5.
It calls this address: https://server-fqdn:9000/search?q=&rangetype=relative&relative=300
And the server is running into a timeout. No logs are being written.
If i restart the server and call this address, same result. :confused:

I can login, click around, press f5 (without ctrl), and look around.

Only ctrl+f5 is killing it.

I don’t know why.

Following things done:

http_bind_address = 0.0.0.0:9000
http_publish_uri = https:/server-fqdn:9000/
rest_transport_uri = https://server-ip:9000/api
rest_listen_uri = https://server-fqdn:9000/api/
rest_enable_cors = true

I dont’t know why but it am only getting to this point by enabling these configuration parameters as mentioned above. :confused:

Help highly appreciated. I don’t know where to look.

/EDIT: i tried configuring self signed certs as mentioned in the docs and added them to a jvm keystore. Also linked it in the /etc/default/graylog… file. Same Error results. :confused:

Hi
Have you checked your running the latest version of Java ? I had a similar issue to this a while ago graylog would crash when access from a browser, and found it was on V1 upgraded to V11 and the issue was fixed. Worth a check …

java -version

@twakki
What I was trying to suggest was I have not seen these configuration’s in Graylog Version 4.0.

I do not have those setting in my Graylog Version 4.0 configuration file.

For example, this is my graylog server configuration file.
My environment I have virtual machine (all-in-one) 10 CPU’s, 1 TB HDD, 10 GB ram. My java heap is only 2GB.


is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret = some-string
root_password_sha2 =some-string
root_email = “some-email”
root_timezone = some-time-zone
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = ipaddress:9000
http_publish_uri = https://ipaddress:9000/
http_enable_cors = true
http_enable_tls = true
http_tls_cert_file = /etc/ssl/certs/graylog/graylog-certificate.pem
http_tls_key_file = /etc/ssl/certs/graylog/graylog-key.pem
http_tls_key_password = secret
elasticsearch_hosts = http://ipaddress:9200
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = true
allow_highlighting = false
elasticsearch_analyzer = standard
elasticsearch_index_optimization_timeout = 1h
output_batch_size = 5000
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
message_journal_max_size = 12gb
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://some-username:somepassword@localhost:27017/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
transport_email_enabled = true
transport_email_hostname = localhost
tansport_email_port = 25
transport_email_subject_prefix = [graylog]
transport_email_from_email = root@domain.local
transport_email_web_interface_url = https://ipaddress:9000
http_connect_timeout = 10s
proxied_requests_thread_pool_size = 32

If you noticed i have no settings for rest_transport_uri, rest_listen_uri, rest_enable_cors, also I’m using processbuffer_processors = 5 and outputbuffer_processors = 3 out of 10 CPU’s, I left the other two for my system… But each environment can be different.
Hope that helps.

Thank you for this insight. I will try this out and report back!

gsmith, i thank you for your advice!
In the meantime i installed a new vServer, same spec’s as the old one.
I reinstalled everything in the exact order the documentation is giving to me.
Then i went through the graylog config and configured each parameter as you posted it. (Except Processbuffer_processors = 5 & outputbuffer_processors = 3, i set those to 1 in my enviroment. (My vServer got 4 cores at this moment)

And what should i say, it’s running since ~60 minutes now without any problems. I could insert LDAP authentication and was able to login with 3 users at a time without errors. Even the ctrl+f5 problem does not appear anymore.

I am going to document this now, setup DNS names and try to import the ca certificate into my jvm store + create a trusted certificate for my environment before i start creating inputs.

Huge thanks to all of you guys who tried to help me out! <3

This case is solved (more or less) and can be closed. :slight_smile:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.