Journal Problems

(Justin) #1

Good Morning,

after the weekend I got two error Messages from Graylog, which are:

  1. Uncommited messages deleted from journal
  2. Journal utilization is too high

I researched and found out that you should extend the journal size.
I did set the journal size to 4GB now. My question is if this is most likely the solution for both error messages?
The message_journal_max_age is commented with “#” so I guess Graylog takes the default vaule which is 12 hours should I increase that too? If so to what value?

EDIT: Just realized that Journal Utilization keeps growing,100K unprocessed messages so far. When I check the Search I don’t get new messages the total messages received Number stays the same.

Further Information:
2 CPU Cores
(I could increse the virtual Hardware settings if needed)

Just want to make sure Graylog runs in future without issues like that.
Thanks in advance.

(Jochen) #2

No, you need to find out why the message journal grew and whether your outputs (e. g. your Elasticsearch cluster) are able to keep up with the throughput of ingested messages.

(Justin) #3

In Indexer failures I found:
3 hours ago graylog_2 b76b58d0-cdc0-11e7-9132-00505690aabe {“type”:“i_o_exception”,“reason”:“No space left on device”}

Any advise where I can find more Logs?

I don’t understand why Graylog won’t log no more. It says Processing XY messages every second but I can’t find anything in the Search. It still writes all Logs to the Disk Journal. Did I forget to turn something on?`

Inside the Logs (Search-> Source:graylog-server) I can only find Logs like:
WARN [KafkaJournal] Journal utilization (103.0%) has gone over 95%.

I checked the Cluster Health:

(Jochen) #4

To be honest, I don’t know how to make this more clear than this message…

The journal might have been corrupted when the disk ran out of free space, so you should try starting Graylog after deleting the existing journal files (which will lead to the loss of all messages in it).

(Justin) #5

I understand your response, it’s just not clear to me what Disk space this error message was reffering to, it still is not…

So I did increase the Journal Size to 4GB as mentioned above and also deleted the Journal folder.
Graylog processes logs correctly now.

Can I prevent this failure from happening again? I mean, when the Disk space was the cause before it can happen again…
Disk Journal has about 100 unprocessed messaged on average.

(Jochen) #6

Just for clarification and to find out what went wrong: What do you think “disk space” is?

The message was probably referring to the disk space on the partition hosting the Graylog journal or the Elasticsearch data path.

Yes. Start monitoring your free disk space on all relevant partitions and act before any of them runs out of free space.

(Justin) #7

This answer states my problem. It was not defined which disk space it was, that’s what caused me the trouble…
Thanks for the help, appriciate it. I will check the disk space of the partitions and if needed increase it.

(Jochen) #8

I remember you’re using the OVA (virtual appliance), so unless you’ve actively moved data around and manually created additional disk partitions, everything is on the same disk partition.

(Justin) #9

Okay now no message are getting processed again…
I didn’t really change much after installing the OVA which should work out of the box from what I read so far.
Graylog keeps getting “In” messages but does not put them “Out”.
The Output buffer is 100% anbd the process buffer is rising constantly, 43% atm. Disk Journal keeps growing again, too (100K unprocessed messages).
Memory(Heap usage of the JVM) is hitting 1,4 GB every 20 secs or so, too.

I don’t understand why this happens nor do I understand why I have to increase some configurations in the config.
An average of 100 mfg/sec shouldn’t be a problem for graylog I guess…

(Justin) #10

I checked the DISK size with df -h and it returns the following:
/dev/dm-0 15G 15G 0M 100% /

We have 9.000.000 Logs so far and already reached 15 GB.
Graylogs default configuration can take up to 400.000.000 Logs (20 indicies times 20.000.000 Logs) which would result in 667GB of disk usage.
This can’t be right, can it?

I am very confused right now since it doesn’t seem to be a regular problem with graylog.

(Jochen) #11

You have to customize the retention settings to your environment.

Either you reduce the retention time (how many and how long logs are kept until they’re deleted) or you provide more disk space.

(Justin) #12

I will set up Graylog again, this time without the OVA image. Gives me more flexibility.
One more question tho. Since we want a good balance between Log Amount (Disk Space) and Time I thought of not logging all Logs from a Windows Server.
I want to send only some specific Logs to the Grayserver, so it doesn’t get overflooded. Is this possible? If so how do I do that?

Thanks in advance.

(Jochen) #13

You can configure Winlogbeat or NXLOG so that they only send certain events to Graylog. Please consult the respective documentations.

Alternatively you can use the Graylog processing pipeline to drop any message you don’t want to index:

(system) closed #14

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.