Graylog 4 web interface very slow on all pages

Hi community

Please shed some light wherever you can, happy to provide more information.

We have recently upgraded our on-premises Graylog cluster to V4 and it is now very slow.

The cluster is set up with 2x Graylog nodes, 3x Elasticsearch and MongoDB replica-set running on the same hosts as Elasticsearch. We load-balance the Graylog nodes using NGINX. All of these nodes are in the same network zone, no firewalls.

We have tried removing NGINX to access the nodes directly, performance still the same. Elasticsearch HTTP response time is all good. The load/performance on all the nodes looks good. Nothing out of the ordinary on the log files for mongodb, elasticsearch and graylog-server.

We have a similar setup in production which is running on Graylog 3.0.0 and is performing as expected.

Hosts + Setup Info

All linux Ubuntu 16.04

2x Graylog nodes:
   - 16GB Memory
   - 16Cores
   - Each node running Graylog v4.0.0 with heap config: -Xms3g -Xmx10g

3x Elasticsearch + MongoDB nodes:
   - 16GB Memory
   - 8Cores
   - Each running MongoDB v4.0.21 + Elasticsearch v6.8.13 with heap: -Xms8g  -Xmx8g

Configurations in Ansible template form - security purposes:

server.conf

############################
# GRAYLOG CONFIGURATION FILE
###########################

is_master = true

node_id_file = /etc/graylog/server/node-id

password_secret = {{ graylog_ui_password }}

root_username = {{ graylog_ui_username }}

root_password_sha2 = {{ graylog_ui_password }}

root_timezone = Africa/Johannesburg

plugin_dir = {{ plugin_dir }}

###############
# HTTP settings
###############

http_bind_address = 0.0.0.0:{{ graylog_listen_port }}

http_publish_uri = https://{{ inventory_hostname }}:{{ graylog_listen_port }}/

http_external_uri = https://{{ loadbalancer_url }}/

http_enable_tls = true

http_tls_cert_file = {{ cert_file }}


http_tls_key_file = {{ key_file }}

elasticsearch_hosts = http://{{ elasticsearch_mongo_hosts[0] }}:{{ elasticsearch_listen_port }},\
                      http://{{ elasticsearch_mongo_hosts[1] }}:{{ elasticsearch_listen_port }},\
                      http://{{ elasticsearch_mongo_hosts[2] }}:{{ elasticsearch_listen_port }}

elasticsearch_connect_timeout = 20s

elasticsearch_max_total_connections = 40

elasticsearch_max_total_connections_per_route = 4

rotation_strategy = time

elasticsearch_max_time_per_index = {{ es_index_rotation }}

elasticsearch_max_number_of_indices = 180

retention_strategy = delete

elasticsearch_shards = 3
elasticsearch_replicas = 1

elasticsearch_index_prefix = graylog

elasticsearch_analyzer = standard
elasticsearch_request_timeout = 2m
elasticsearch_index_optimization_jobs = 50

output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30

processbuffer_processors = 5

outputbuffer_processors = 3

processor_wait_strategy = blocking

ring_size = 65536

inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking

message_journal_enabled = true

message_journal_dir = /var/lib/graylog-server/journal

message_journal_max_age = 12h
message_journal_max_size = 5gb

lb_recognition_period_seconds = 3

stream_processing_timeout = 5000
stream_processing_max_faults = 5

mongodb_uri = mongodb://{{ mongodb_user }}:{{ mongodb_pass }}@{{ elasticsearch_mongo_hosts[0] }}:{{ mongodb_listen_port }},\
              {{ elasticsearch_mongo_hosts[1] }}:{{ mongodb_listen_port }},\
              {{ elasticsearch_mongo_hosts[2] }}:{{ mongodb_listen_port }}/{{ mongodb }}

mongodb_max_connections = 100

mongodb_threads_allowed_to_block_multiplier = 5

proxied_requests_thread_pool_size = 32

elasticsearch.yml

# ======================== Elasticsearch Configuration =========================

cluster.name: {{ cluster }}

node.name: ${HOSTNAME}

path.data: {{ es_data_dir }}

path.logs: {{ es_log_dir }}

bootstrap.memory_lock: {{ memory_lock }}

network.host: ${HOSTNAME}

http.port: {{ elasticsearch_listen_port }}

discovery.zen.ping.unicast.hosts: ["{{ elasticsearch_mongo_hosts[0] }}", "{{ elasticsearch_mongo_hosts[1] }}", "{{ elasticsearch_mongo_hosts[2] }}"]

discovery.zen.minimum_master_nodes: 2

gateway.recover_after_nodes: 2

mongo.conf

storage:
  dbPath: {{ mongo_data_dir }}
  journal:
     commitIntervalMs: 120
  directoryPerDB: true
  syncPeriodSecs: 80

systemLog:
  destination: file
  logAppend: true
  verbosity: 1
  traceAllExceptions: true
  logRotate: rename
  timeStampFormat: ctime
  path: /var/log/mongodb/mongod.log

net:
  port: {{ mongodb_listen_port }}
  bindIp: {{ inventory_hostname }}, 127.0.0.1
  bindIpAll: false
  maxIncomingConnections: 51200
  wireObjectCheck: false
  ipv6: false
  unixDomainSocket:
    enabled: true
    pathPrefix: /tmp
    filePermissions: 0700
  ssl:
    mode: allowSSL
    PEMKeyFile: {{ PEMKeyFile }}
    clusterFile: {{ PEMKeyFile }}
    CAFile: {{ CAFile }}
    allowConnectionsWithoutCertificates: false
    allowInvalidCertificates: false
    allowInvalidHostnames: false
  compression:
     compressors: zlib,snappy
  transportLayer: asio
  serviceExecutor: synchronous

processManagement:
  timeZoneInfo: /usr/share/zoneinfo
  pidFilePath: /var/log/mongodb/mongo.pid
  fork: false

security:
  keyFile: {{ keyFile }}
  clusterAuthMode: keyFile
  authorization: enabled
  transitionToAuth: false
  javascriptEnabled:  true

operationProfiling:
  mode: slowOp
  slowOpThresholdMs: 10000
  slowOpSampleRate: 1.0

replication:
  replSetName: graylog-dev
  secondaryIndexPrefetch: all

Nodes overview:

Sample profiling:

Thread dump on the nodes shows a couple of locks and threads in WAITING state. I hit the Body limit so I cannot post that here.

Hey @mjwnkuna, welcome!

Although I know the documentation says ES 6.8.x+ and MongoDB 4.x+ are supported with Graylog 4.x, have you considered upgrading ES to 7.8 and MongoDB to 4.4? Maybe there is an incompatibility causing timeouts and slowness.

Hey @ttsandrew, apologies for the delayed response had to take an unplanned time off. Thanks for the suggestion I am going to try that, will provided feedback.

Hi @ttsandrew thank you for the suggestion. I upgraded ES and mongo to 4.2 and not 4.4 because as I’ve only seen the docs mention up until 4.2. Unfortunately performance is still the same.

1 Like

Are both of your Graylog nodes set to master?

you should consider configuring your heap to a consistent size by setting both -Xms and -Xmx to the same value. Also, it is recommended to not set the heap to more than 32g or 50% of system memory. Which ever is less.

have you upgraded the plugins as well?

Also, can you provide a bit more detail about what exactly you mean by it is running slow?

1 Like

No - Only one has is_master set to true

I have now adjusted this on all my graylog nodes

yes - Also I previously had additional slack + metrics-prometheus plugins which I removed after seeing a suggestion on one of the posts in the forum.

List of plugins currently active (all come with install):

  • graylog-plugin-collector-4.0.1.jar
  • graylog-plugin-threatintel-4.0.1.jar
  • graylog-storage-elasticsearch6-4.0.1.jar
  • graylog-storage-elasticsearch7-4.0.1.jar

The page load time is significantly slow, on busy days we are talking about 20seconds of wait time. It’s currently averaging on 5seconds as most of our engineers are on holiday. Unfortunately this is NOT only on specific pages like the Dashboards page, it’s across all pages including the landing/login page.

Will be updating to java 11 hopefully that will change something.

1 Like

Yeah… seems odd. Did you reboot the nodes? I didn’t see that mentioned as attempted.

Yes I have tried that a couple of times. I am in the process of building new VMs in Ubuntu 18.04, I have high hopes that fresh VMs may solve this odd issue.

well good luck… hope it does fix it for you.

1 Like

@mjwnkuna
I had the same problem with searches being slow and the web interface. What I had to do was put mongoDb and graylog on the same node and gave Elasticsearch there own node. just an idea.

Thanks for the idea @gsmith, I believe it makes sense since Graylog config is in MongoDB so any bottleneck there would cause such issues. If the fresh install that I am working on does not resolve the issues I sure will try this.

Were your MongoDB and Graylog nodes in the same subnet before you combined Mongo+Graylog onto one node?

Yes, and accually elasticsearch is a resource intensive, well depending on you setup and configuration.

My environment is setup like this also.
https://docs.graylog.org/en/4.0/pages/architecture.html#bigger-production-setup

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.