"search" pages loading extremely slowly and causing browser to go unresponsive

Description of your problem

I am having an issue where messages always take a very long time (>3 minutes) to show up on the search page. During this time the page appears unresponsive to the browser (Firefox or Chrome) and it pops up the “page unresponsive” dialog. After the first load, searching is quick and everything is normal until the page is reloaded. There are no log messages generated in the Graylog server log, the Elasticsearch log, the Mongodb log or the browser dev console when this issue occurs.

Description of steps you’ve taken to attempt to solve the issue

Switched from OpenJDK 11 to OpenJDK 8
Upgraded from MongoDB 4.1 to 4.2
Upgraded from ES 6.8 to 7.10
Removed unneeded plugins
Removed unneeded content packs
Recalculated index ranges
Rotated active write index

Environmental information

Intel Xeon e3-1271v3
32GB RAM
8x1TB SSD in RAID10
Mongo, Elastic and Graylog all on this server

Operating system information

Debian 10 Buster

Package versions

Graylog 4.2.1
Elasticsearch 7.10.2
Mongodb 4.2.17
OpenJDK 1.8.0_292

Hello,

I might be able to help. Need to ask a couple question that may pertain to web loading problems.

  • How much logs are you ingesting per hour/day?
  • What does you index setting look like? or do you have multiply index sets beside the default ones?
  • What are you log retention settings?
  • Do you have a lot of extractors configured? if so what type and how many do you have?
  • How’s your JAVA heap setting for Graylog?

Judging from your Environmental information, If you have a lot of logs being ingested you know elasticsearch is resource intensive. That why most community member separate ES from Graylog/MongoDb in a situation like that. I have seen in some situation where they have over sharded there server causing a lag and unresponsive interface. You can find out why here

  • 300k max per hour, similar load 24/7
  • Using just the default index set
  • Rotating at 10GB, saving last 400
  • 2 Grok, 4 copy input, 7 regex
  • Java heap is at 12GB, it never seems to use more than 3-4GB

It doesn’t appear to be a message processing delay. The journal is almost empty and alerts fire instantly.

I also recall reading somewhere that Graylog does not like lengthy log messages. I have some Windows Active Directory logs like this, which are quite lengthy. Could this be related?

So 4 shards and no Replicas and 400 Indices, correct?

4 x 400 = 1600 shard I do believe.

From this command , can you show us the output if possible?

curl -XGET "localhost:9200/_cluster/health?pretty"

Same here, I have a CentOS 7 with 14 CPU cores, 12 GB RAM, and 500GB disk. It runs 30 GB logs a day with 30 Days retention using TCP/TLS for Web UI and INPUTS. but no problems, or I should say Not yet :slight_smile:

EDIT: a quick glance here are my windows logs for an hour.

It would help you show you configuration files if you can. maybe there are some adjustments we can suggest to help you further.

Here’s the cluster health:

  "cluster_name" : "graylog",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 620,
  "active_shards" : 620,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0

Hello,

For the amount on shards It looks good compared to the resource you have. To be honest you shouldn’t have a problem. As you stated it takes about 3+ minutes for messages to show on the Web UI but alerts are fine.

  • Check your browser cache reload the tab?
  • Check firewall?
  • Check Selinux/apparmor?
  • Do you have Proxy ( Nginx. Apache) in front of Graylog?
  • Have you checked your network performance when loading the Web UI Search page?

Example:

I have search the forum of different events when the Web is slow or unresponsive. I think were missing something but not sure what. My system does not have a problem like that I and think we have the same amount of logs ingest but you have triple the amount of resources then I do.
BTW I have two sets Windows Active Directory servers sending logs to Graylog so I know what you mean about large message plus we turned up the audit logs on and four MSAD servers to get even more logs.
It might be a configuration issue, I’m not sure how you set you system up thou.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.