No more messages flowing inbound? Started over twice now... what am I doing wrong?

gleep52 · February 1, 2022, 10:19pm

Before you post: Your responses to these questions will help the community help you. Please complete this template if you’re asking a support question.
Don’t forget to select tags to help index your topic!

1. Describe your incident:

Installed ubuntu 20 and followed the guide to install graylog - set up three DCs and our Meraki gear to forward syslog traffic to port 1515. Also tinkered with nxlog forwarding with a beats input on port 5044.

Everything works well for a day or two and then messages stop flowing inbound.

2. Describe your environment:

OS Information:
Ubuntu 20.04 on hyper-v VM
8 cores Xeon Gold 6148 CPU
24GB memory
Package Version:
ii elasticsearch-oss 7.10.2 amd64 Distributed RESTful search engine built for the cloud
ii graylog-4.2-repository 1-4 all Package to install Graylog 4.2 GPG key and repository
ii graylog-integrations-plugins 4.2.5-1 all Graylog Integrations plugins
ii graylog-server 4.2.5-1 all Graylog server
ii mongodb-org 4.0.28 amd64 MongoDB open source document-oriented database system (metapackage)
ii mongodb-org-mongos 4.0.28 amd64 MongoDB sharded cluster query router
ii mongodb-org-server 4.0.28 amd64 MongoDB database server
ii mongodb-org-shell 4.0.28 amd64 MongoDB shell client
ii mongodb-org-tools 4.0.28 amd64 MongoDB tools
Service logs, configurations, and environment variables:
server.conf file:
is_master = true
node_id_file = /etc/graylog/server/node-id
password_secret =
root_password_sha2 =
bin_dir = /usr/share/graylog-server/bin
data_dir = /var/lib/graylog-server
plugin_dir = /usr/share/graylog-server/plugin
http_bind_address = 10.10.10.27:9000
http_enable_cors = false
rotation_strategy = count
elasticsearch_max_docs_per_index = 20000000
elasticsearch_max_number_of_indices = 20
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 0
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = false
allow_highlighting = false
elasticsearch_analyzer = standard
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/lib/graylog-server/journal
lb_recognition_period_seconds = 3
mongodb_uri = mongodb://localhost/graylog
mongodb_max_connections = 1000
mongodb_threads_allowed_to_block_multiplier = 5
proxied_requests_thread_pool_size = 32

3. What steps have you already taken to try and solve the problem?

tail -f /var/log/graylog-server/server.log
2022-02-01T15:38:10.404-06:00 WARN [LocalKafkaJournal] Journal utilization (101.0%) has gone over 95%.
2022-02-01T15:38:41.843-06:00 INFO [connection] Opened connection [connectionId{localValue:17, serverValue:17}] to localhost:27017
2022-02-01T15:38:41.850-06:00 INFO [connection] Opened connection [connectionId{localValue:15, serverValue:15}] to localhost:27017
2022-02-01T15:38:41.851-06:00 INFO [connection] Opened connection [connectionId{localValue:16, serverValue:16}] to localhost:27017
2022-02-01T15:38:41.851-06:00 INFO [connection] Opened connection [connectionId{localValue:14, serverValue:12}] to localhost:27017
2022-02-01T15:38:41.852-06:00 INFO [connection] Opened connection [connectionId{localValue:13, serverValue:13}] to localhost:27017
2022-02-01T15:38:41.854-06:00 INFO [connection] Opened connection [connectionId{localValue:11, serverValue:11}] to localhost:27017
2022-02-01T15:38:41.857-06:00 INFO [connection] Opened connection [connectionId{localValue:12, serverValue:14}] to localhost:27017
2022-02-01T15:39:11.056-06:00 WARN [LocalKafkaJournal] Journal utilization (98.0%) has gone over 95%.
2022-02-01T15:39:11.058-06:00 INFO [LocalKafkaJournal] Journal usage is 98.00% (threshold 100%), changing load balancer status from THROTTLED to ALIVE

Journal usage overrun maybe?

curl -XGET http://localhost:9200/_cluster/health?pretty=true
{
“cluster_name” : “graylog”,
“status” : “green”,
“timed_out” : false,
“number_of_nodes” : 1,
“number_of_data_nodes” : 1,
“active_primary_shards” : 20,
“active_shards” : 20,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 0,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 100.0
}

I’m not sure what a shard is - but assuming if it’s 100% then it’s full or too busy to process more. What can I do to get messages flowing again? And most importantly please help me understand why this happened to prevent it from reoccurring. (Like deleting a folder of queued messages is nice to know how and what to do, but why didn’t they process and what settings do I need to adjust for proper retention and automatic digesting)?

4. How can the community help?

I’m guessing my elasticsearch settings are wrong or perhaps the server isn’t powerful enough? I cannot get into the /etc/elasticsearch directory as access is denied - is that expected? I don’t want to start modifying file permissions without understanding why or what that might break.

How can I get messages flowing again?

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

gsmith · February 1, 2022, 11:33pm

Hello && Welcome

This warning mean that your Journal utilization full. It could be a couple different things.

Check your Elasticsearch status (systemctl status elasticsearch) and ES log files.
Perhaps Increase your Journal from default of 5 Gb to maybe 10 GB IF you have enough room on your Drive to do so.
If your process buffer are shown full on the Web UI you may want to increase that, they would be in your server.conf file **processbuffer_processors = 5 ** and wait. This may take time to complete and the logs in your journal. How much resource you have and/or how much messages GL is ingesting.

Your process buffer is the big hitter, then your output buffer. If buffers are full, I would only increase the Process buffer to 6 and wait to see if it goes down. Take note when you GL Volume is full, or your journal is full, or your buffers are at 100% logs coming in will pause. You will be unable to see those messages until your issue is resolved. Normally when the journal is full this means the resource are unable to keep up with the amount of messages coming in or Elasticsearch issue.

So the user you logged into Graylog server does not have the permission to that directory. So this seam like a permission issue also. Check your Server logs Elasticsearch/Graylog.

EDIT: Since your using nxlog I had a problem a while back one Device was sending over 12,000 messages. This was from an nxlog client and it filled my journal up really quick over tim you may want to check your shippers.

gleep52 · February 1, 2022, 11:59pm

The user I am logged in with is the user I installed graylog with but as I am not able to see inside the /etc/elasticsearch/ folder - I’m guessing this might be an issue? Maybe once the journal fills up it cannot write them to elastic? Or am I miss understanding the flow? Is it journal → elastic → mongo?

Status is up and running but I’m not sure how to access the logs since I can’t see in the directory? Is it best to chmod the elasticsearch folder or better to give my user root access, even if temporarily?

Messages do seem to skyrocket when I first opened the ports for ingestion (4000/second messages) but eventually slowed down to 150/second or so… No idea what graylog is capable of or if I’m choking it with data or not.

Seeing as I’m using ssh to do everything - what are good ways to check the journal size, hard drive space, etc? How do I increase the journal size? The server has either 300 or 600gb for its storage.

What happens to old logs - assuming it deletes by default per the server config - but is there a way to archive them onto an SMB share or such instead for longevity?

gsmith · February 2, 2022, 12:11am

Yes its probably combination of a few issue you may have.

If the user you installed Graylog with has root permission then you could adjust it, but if the user you install Graylog doesn’t not have permission to /etc/elasticsearch directory you may want to talk to the local admin about getting access.

journal is in the Web UI under System/Node

root # df -h

message_journal_max_size = 5gb

You have a lot going on. I would first find out WHY you journal is filling up over it capacity and resolve that issue.

Also…

If you don’t have access to log files then you have more problems

bking · February 10, 2022, 6:37pm

I might have some insight regarding your (in)ability to access certain files and directories in Ubuntu. I’m not a Linux person, so it’s possible I’m violating best practice here. Please take it with a grain of salt.

I’ve noticed that on Ubuntu server (I’m assuming you installed Ubuntu 20.04 LTS), the ‘administrative’ user you make as part of the install isn’t actually a root user (which I think is on purpose?). You can ‘sudo’ to do administrative commands, and edit files that require root access (e.g. “sudo nano /etc/graylog/server/server.conf”), but you can’t actually browse those directories as the admin user. This is where I may be violating best practice, but if you do “sudo su” you’ll have access to a root command prompt, which will let you ‘cd’ and ‘ls’ any directory (including /etc/elasticsearch). I’m pretty sure that’s all operating as intended, and it may not actually be a problem if your normal admin user can’t directly access those directories.

I hope that helps (and that I haven’t lead you astray).

system · February 24, 2022, 6:37pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Input not receiving any new messages Graylog Central (peer support)	26	8424	December 15, 2020
Graylog not receiving messages, unprocessed messages Graylog Central (peer support)	22	4391	June 23, 2022
Nothing showing in any streams since OS patching Graylog Central (peer support)	8	898	October 7, 2022
Graylog stopps processing messages. Only coming in but no out Graylog Central (peer support)	6	11103	November 12, 2018
Graylog - Uncommited messages deleted from journal & utilization is too high Graylog Central (peer support) sidecar	7	16574	November 29, 2017

No more messages flowing inbound? Started over twice now... what am I doing wrong?

Related topics