Collected documents(logs) are getting deleted in index

  1. Collected documents(logs) are getting deleted after sometime(45 mins approx.) in index and range is recalculating every 45 mins and getting started collecting logs again and the process repeats. Due to this i m losing previous(deleted) documents permenantly. The range i set for 7 days rotation. I am using 2-node setup each node contains graylog, elasticsearch, mongodb. I am guessing the issue is with elasticsearch configuration and i m unable to find the solution for it. I am attaching the relevant screenshots of conf files. Feel free to ask me for any further details if needed.
  • Package Version:
    graylog - 5.0.3
    elasticsearch - 8.11
    mongodb - 6.0


graylog node -1 configuration for node i just kept no commented is_master = true


elasticsearch node-1 configuration


elasticsearch node-2 configuration

Helpful Posting Tips: Tips for Posting Questions that Get Answers [Hold down CTRL and link on link to open tips documents in a separate tab]

What does the log file show at the time of rotation/deletion?

The Graylog log file continuously show this logs

Your system is in an inconsistent state. If you search on the error message in this forum you will find several threads that offer trouble-shooting steps.

Typically, Elastic/Opensearch and the Graylog MongoDB have gotten out of sync, e.g. due to manually deleting data in ES/OS. If that is an option for you, you could drop the entire Graylog database in Mongo and restart.

1 Like

I completed the entire process, dropped the Graylog database, and restarted both MongoDB and Graylog, but the problem is repeating.

My index range is recalculating every time and data is getting lost

Strange. Need more information to figure this out. Right now it’s just guesswork.
You pasted settings for the global index set defaults rotation_strategy and elasticsearch_max_time_per_index. How are the actual indices configured?
Are there other errors in the log?
Check your ES cluster health as described in related thread:

Actual Indices configurations:
Index shards: 2
Index replicas: 0
Field type refresh interval:5
rotation strategy: Index Time
Rotation period: P1W
retention strategy: Delete Index
Max number of indices: 20

2.There are other errors in logs

3.After check ES cluster health using command
curl -XGET http://es_node:9200/_cluster/allocation/explain?pretty
Getting this Message
“error” : {
“root_cause” : [
{
“type” : “illegal_argument_exception”,
“reason” : “unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]”
}
],
“type” : “illegal_argument_exception”,
“reason” : “unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]”
},
“status” : 400

I think the main problem with elastisearch as it is deletings indices

Not sure what’s going on - sorry. What exactly do you mean when you say logs are being deleted? The index is being deleted in ES? Is there anything in GL or ES logs that might indicate what is initiating the deletion?

  • The documents(logs from Syslogs) collected in an index are unexpectedly deleted, and the index does not rotate to a new set.For example, if an index initially collects 98 documents, after some time, it shows 0 documents collected, and the system starts collecting documents (logs) again from the beginning.

  • And there is no error log regarding this issuse in GL or ES logs

  • My Elasticsearch (ES) is not displaying any logs, including general Elasticsearch log

Graylog will never delete messages from an index. Entire indices get rotated and may ultimately be deleted, based on the rotation and retention settings for the index set.

Something else is going on here. I agree that it sounds like something is happening on the ES side; and the GL errors are just a consequence of that. You mention this happening at 45 minute intervals. What could be running on that schedule?

  • The timing is not fixed; sometimes it occurs every 45 minutes, sometimes every hour, and occasionally even after 24 hours.
  • There is nothing scheduled at that time.

Hello @adarsh,

How is the range being recalculated every 45 mins?

What is the output of these individual api calls?

curl -XGET "http://node.name:9200/_cat/shards?=pretty"

and

curl -XGET "http://node.name:9200/_cat/indices?=pretty"

and

curl -XGET "http://node.name:9200/_cluster/health?=pretty"

  • The timing is not fixed; sometimes it occurs every 45 minutes, sometimes every hour, and occasionally even after 24 hours.

curl -XGET “http://node.name:9200/_cat/shards?=pretty”`

curl -XGET “http://node.name:9200/_cat/indices?=pretty

curl -XGET “http://node.name:9200/_cluster/health?=pretty
Screenshot 2023-11-21 at 2.40.39 PM

I foolishly didn’t read your initial post thoroughly. Currently you are running an unsupported version of Elastic, the last supported version was 7.10.2.

At this point my recommendation would be to move to Opensearch 2.9 and see if the issue persists, if the system is so compromised that it currently can’t store data then use it as an opportunity to start fresh with a new install.

My Graylog logs are filled with this error. Is there any connection with my problem?

Hey @adarsh

Just chimming in, I looked over you configuration for ES/OS, Since this looks like a cluster of 2 nodes I dont see the following configured.

node.roles: [ cluster_manager ]

I entirely changed my configuration from Elasticsearch to OpenSearch with these settings, but I am still getting the same error in the Graylog logs

My node-1 configuration (cluster manager)

Node-2 configuration (data node)