Not receiving Logs from EC2 instance with Opensearch

Hello, our graylog instance isn’t receiving logs from our application. Below is the configuration:

TCP
bind_address: 0.0.0.0
charset_name: UTF-8
decompress_size_limit: 8388608
max_message_size: 2097152
number_worker_threads: 4
override_source: <empty>
port: 12202
bind_address: 0.0.0.0
charset_name: UTF-8
decompress_size_limit: 8388608
number_worker_threads: 4
override_source: <empty>
port: 12201
recv_buffer_size: 262144
Maximum size:
1.0GiB
Maximum age:
12 hours 0 minutes
Flush policy:
Every 1,000,000 messages or 1 minute 0 seconds
Utilization
0.00%
-264,157,086 unprocessed messages are currently in the journal, in 1 segments.
0 messages have been appended in the last second, 0 messages have been read in the last second.
{
  "cluster_name" : "graylog",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "discovered_master" : true,
  "discovered_cluster_manager" : true,
  "active_primary_shards" : 15,
  "active_shards" : 15,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Error with node:

 Nodes with too long GC pauses (triggered 7 months ago)
There are Graylog nodes on which the garbage collector runs too long. Garbage collection runs should be as short as possible. Please check whether those nodes are healthy. (Node: cb4163ab-0761-4a85-8c59-f7f3bc8ad814, GC duration: 1057 ms, GC threshold: 1000 ms)
/etc/opensearch/jvm.options
-Xms10g
-Xmx10g

I’m not sure why it’s not processing messages. Our opensearch is healthy and graylog is running. Any ideas on why this happening? I followed the graylog documentation but same error.

Hello @rojeffe,

Try increasing the heap assigned to the Graylog nodes, under /etc/default/graylog.

Delete the journals of the Graylog nodes an restart the Graylog service. sudo rm -R /var/lib/graylog-server/journal/*

HI @Wine_Merchant , thanks for your response. I did the steps you recommended but I’m still not seeing messages. For some reason it still says: -264,157,009 unprocessed messages are currently in the journal, in 1 segments.

drwxr-xr-x 2 graylog graylog 4096 Nov 18 13:52 messagejournal-0
-rw-r--r-- 1 graylog graylog   24 Nov 18 13:55 recovery-point-offset-checkpoint
-rw-r--r-- 1 graylog graylog    9 Nov 18 13:56 graylog2-committed-read-offset

This is odd because I deleted the directory. Also, I increased the heap to:

GRAYLOG_SERVER_JAVA_OPTS="-Xms10g -Xmx10g -server -XX:+UseG1GC -XX:-OmitStackTraceInFastThrow"

That is odd, are the logs indicating what might be causing the issue?

sudo tail -100f /var/log/graylog/server.log

Seems like I did it wrong. I had to stop graylog delete the journal and then start graylog. Before I was deleting the journal and restarting graylog thinking that the changes would take in effect.

Is this the proper way to manage logs? Our logs ramp up rather quickly.

Understand how long you want to store data for and then size your Graylog cluster taking this and daily log volume into account.

For example you receive daily 10GB and you want to store logs for 60 days, 10(daily ingest)*60(days of searchable data)*1.3(overhead)=780GB. You would then divide that 780GB by the number of Opensearch nodes, if utilising replicas you would need to double that 780GB but that’s only worth thinking about when utilising 3 or more Opensearch nodes.

Thanks so much! I take this information back to the team! Thanks for all your help!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.