"search" pages loading extremely slowly and causing browser to go unresponsive

jtbis · November 12, 2021, 12:52am

Description of your problem

I am having an issue where messages always take a very long time (>3 minutes) to show up on the search page. During this time the page appears unresponsive to the browser (Firefox or Chrome) and it pops up the “page unresponsive” dialog. After the first load, searching is quick and everything is normal until the page is reloaded. There are no log messages generated in the Graylog server log, the Elasticsearch log, the Mongodb log or the browser dev console when this issue occurs.

Description of steps you’ve taken to attempt to solve the issue

Switched from OpenJDK 11 to OpenJDK 8
Upgraded from MongoDB 4.1 to 4.2
Upgraded from ES 6.8 to 7.10
Removed unneeded plugins
Removed unneeded content packs
Recalculated index ranges
Rotated active write index

Environmental information

Intel Xeon e3-1271v3
32GB RAM
8x1TB SSD in RAID10
Mongo, Elastic and Graylog all on this server

Operating system information

Debian 10 Buster

Package versions

Graylog 4.2.1
Elasticsearch 7.10.2
Mongodb 4.2.17
OpenJDK 1.8.0_292

gsmith · November 12, 2021, 2:12am

Hello,

I might be able to help. Need to ask a couple question that may pertain to web loading problems.

How much logs are you ingesting per hour/day?
What does you index setting look like? or do you have multiply index sets beside the default ones?
What are you log retention settings?
Do you have a lot of extractors configured? if so what type and how many do you have?
How’s your JAVA heap setting for Graylog?

Judging from your Environmental information, If you have a lot of logs being ingested you know elasticsearch is resource intensive. That why most community member separate ES from Graylog/MongoDb in a situation like that. I have seen in some situation where they have over sharded there server causing a lag and unresponsive interface. You can find out why here

jtbis · November 12, 2021, 2:39am

300k max per hour, similar load 24/7
Using just the default index set
Rotating at 10GB, saving last 400
2 Grok, 4 copy input, 7 regex
Java heap is at 12GB, it never seems to use more than 3-4GB

It doesn’t appear to be a message processing delay. The journal is almost empty and alerts fire instantly.

jtbis · November 12, 2021, 3:02am

I also recall reading somewhere that Graylog does not like lengthy log messages. I have some Windows Active Directory logs like this, which are quite lengthy. Could this be related?

gsmith · November 12, 2021, 3:18am

So 4 shards and no Replicas and 400 Indices, correct?

4 x 400 = 1600 shard I do believe.

From this command , can you show us the output if possible?

curl -XGET "localhost:9200/_cluster/health?pretty"

Same here, I have a CentOS 7 with 14 CPU cores, 12 GB RAM, and 500GB disk. It runs 30 GB logs a day with 30 Days retention using TCP/TLS for Web UI and INPUTS. but no problems, or I should say Not yet

EDIT: a quick glance here are my windows logs for an hour.

It would help you show you configuration files if you can. maybe there are some adjustments we can suggest to help you further.

jtbis · November 12, 2021, 12:59pm

Here’s the cluster health:

  "cluster_name" : "graylog",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 620,
  "active_shards" : 620,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0

gsmith · November 13, 2021, 12:10am

Hello,

For the amount on shards It looks good compared to the resource you have. To be honest you shouldn’t have a problem. As you stated it takes about 3+ minutes for messages to show on the Web UI but alerts are fine.

Check your browser cache reload the tab?
Check firewall?
Check Selinux/apparmor?
Do you have Proxy ( Nginx. Apache) in front of Graylog?
Have you checked your network performance when loading the Web UI Search page?

Example:

I have search the forum of different events when the Web is slow or unresponsive. I think were missing something but not sure what. My system does not have a problem like that I and think we have the same amount of logs ingest but you have triple the amount of resources then I do.
BTW I have two sets Windows Active Directory servers sending logs to Graylog so I know what you mean about large message plus we turned up the audit logs on and four MSAD servers to get even more logs.
It might be a configuration issue, I’m not sure how you set you system up thou.

system · November 27, 2021, 12:11am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Search page loading slowly after upgrading to 4.0.7 Graylog Central (peer support)	3	458	June 22, 2021
Search Crash After upgrading to Graylog 4.2.1 Graylog Central (peer support)	4	1016	December 23, 2021
Graylog web interface is slow after upgrade Graylog Central (peer support)	13	4826	November 8, 2017
Dashboards update Slowly Graylog Central (peer support)	2	686	April 15, 2021
Slow loading and searching in streams Graylog Central (peer support)	6	806	July 4, 2020