Journal Problems

husetech · November 20, 2017, 7:07am

Good Morning,

after the weekend I got two error Messages from Graylog, which are:

Uncommited messages deleted from journal
Journal utilization is too high

I researched and found out that you should extend the journal size.
I did set the journal size to 4GB now. My question is if this is most likely the solution for both error messages?
The message_journal_max_age is commented with “#” so I guess Graylog takes the default vaule which is 12 hours should I increase that too? If so to what value?

EDIT: Just realized that Journal Utilization keeps growing,100K unprocessed messages so far. When I check the Search I don’t get new messages the total messages received Number stays the same.

Further Information:
4GB RAM
2 CPU Cores
(I could increse the virtual Hardware settings if needed)

Just want to make sure Graylog runs in future without issues like that.
Thanks in advance.

jochen · November 20, 2017, 7:50am

No, you need to find out why the message journal grew and whether your outputs (e. g. your Elasticsearch cluster) are able to keep up with the throughput of ingested messages.

husetech · November 20, 2017, 7:59am

In Indexer failures I found:
3 hours ago graylog_2 b76b58d0-cdc0-11e7-9132-00505690aabe {“type”:“i_o_exception”,“reason”:“No space left on device”}

Any advise where I can find more Logs?

I don’t understand why Graylog won’t log no more. It says Processing XY messages every second but I can’t find anything in the Search. It still writes all Logs to the Disk Journal. Did I forget to turn something on?`

Inside the Logs (Search-> Source:graylog-server) I can only find Logs like:
WARN [KafkaJournal] Journal utilization (103.0%) has gone over 95%.

I checked the Cluster Health:

jochen · November 20, 2017, 9:56am

To be honest, I don’t know how to make this more clear than this message…

The journal might have been corrupted when the disk ran out of free space, so you should try starting Graylog after deleting the existing journal files (which will lead to the loss of all messages in it).

husetech · November 20, 2017, 10:23am

I understand your response, it’s just not clear to me what Disk space this error message was reffering to, it still is not…

So I did increase the Journal Size to 4GB as mentioned above and also deleted the Journal folder.
Graylog processes logs correctly now.

Can I prevent this failure from happening again? I mean, when the Disk space was the cause before it can happen again…
Disk Journal has about 100 unprocessed messaged on average.

jochen · November 20, 2017, 10:26am

Just for clarification and to find out what went wrong: What do you think “disk space” is?

The message was probably referring to the disk space on the partition hosting the Graylog journal or the Elasticsearch data path.

Yes. Start monitoring your free disk space on all relevant partitions and act before any of them runs out of free space.

husetech · November 20, 2017, 10:31am

This answer states my problem. It was not defined which disk space it was, that’s what caused me the trouble…
Thanks for the help, appriciate it. I will check the disk space of the partitions and if needed increase it.

jochen · November 20, 2017, 10:54am

I remember you’re using the OVA (virtual appliance), so unless you’ve actively moved data around and manually created additional disk partitions, everything is on the same disk partition.

husetech · November 20, 2017, 11:14am

Okay now no message are getting processed again…
I didn’t really change much after installing the OVA which should work out of the box from what I read so far.
Graylog keeps getting “In” messages but does not put them “Out”.
The Output buffer is 100% anbd the process buffer is rising constantly, 43% atm. Disk Journal keeps growing again, too (100K unprocessed messages).
Memory(Heap usage of the JVM) is hitting 1,4 GB every 20 secs or so, too.

I don’t understand why this happens nor do I understand why I have to increase some configurations in the config.
An average of 100 mfg/sec shouldn’t be a problem for graylog I guess…

husetech · November 20, 2017, 8:12pm

I checked the DISK size with df -h and it returns the following:
/dev/dm-0 15G 15G 0M 100% /

We have 9.000.000 Logs so far and already reached 15 GB.
Graylogs default configuration can take up to 400.000.000 Logs (20 indicies times 20.000.000 Logs) which would result in 667GB of disk usage.
This can’t be right, can it?

I am very confused right now since it doesn’t seem to be a regular problem with graylog.

jochen · November 20, 2017, 8:33pm

You have to customize the retention settings to your environment.

Either you reduce the retention time (how many and how long logs are kept until they’re deleted) or you provide more disk space.

husetech · November 21, 2017, 6:51am

I will set up Graylog again, this time without the OVA image. Gives me more flexibility.
One more question tho. Since we want a good balance between Log Amount (Disk Space) and Time I thought of not logging all Logs from a Windows Server.
I want to send only some specific Logs to the Grayserver, so it doesn’t get overflooded. Is this possible? If so how do I do that?

Thanks in advance.

jochen · November 21, 2017, 8:36am

You can configure Winlogbeat or NXLOG so that they only send certain events to Graylog. Please consult the respective documentations.

Alternatively you can use the Graylog processing pipeline to drop any message you don’t want to index:
http://docs.graylog.org/en/2.3/pages/pipelines.html

system · December 5, 2017, 8:36am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Journal utilization Graylog Central (peer support)	5	2975	April 14, 2022
Journal error, no throughput Graylog Central (peer support)	8	245	March 27, 2024
Journal utilization 99% has gone over 95% Graylog Central (peer support)	15	5312	June 11, 2020
Graylog - Uncommited messages deleted from journal & utilization is too high Graylog Central (peer support) sidecar	7	16550	November 29, 2017
Journal Message processing Graylog Central (peer support)	2	945	June 24, 2017

Journal Problems

Related topics