I’m very sorry for the delay, but am very grateful for the responses.
I’m not seeing any specific entries in either logs, when loading a dashboard or performing searches, but I have posted some of the entries at the bottom.
ES Cluster Health
{
"cluster_name": "ES_Name",
"status": "green",
"timed_out": false,
"number_of_nodes": 11,
"number_of_data_nodes": 8,
"active_primary_shards": 7210,
"active_shards": 19328,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100.0
}
Index details, worker_graylog - this is one of the indices that’s being used that’s performing badly
Index prefix: worker_graylog
Shards: 5
Replicas: 2
Field type refresh interval: 30 seconds
Index rotation strategy: Index Size
Max index size: 5073741824 bytes (4.7GiB)
Index retention strategy: Delete
Max number of indices: 400
Current Total Size: 1.7TB
Last closed Index: 4.3GB / 1,713,687 messages
I can see from the above that there’s one index for about 20 minutes of data, but I have the dashboard configured to view 8 hours of data by default, I can see how this is bad. What would be a good size for this index, given that amount of messages and a desired window of about 8 hours? Any recommendations to the number of shards?
Thank you very much for the support.
Shard details - very cut down
worker_graylog_20776 3 p STARTED 266093 821.2mb 172.30.0.11 node-11
worker_graylog_20776 3 r STARTED 266093 821.2mb 172.30.0.3 node-3
worker_graylog_20776 3 r STARTED 266093 821.2mb 172.30.0.2 node-2
worker_graylog_20776 1 p STARTED 266662 821.1mb 172.30.0.4 node-4
worker_graylog_20776 1 r STARTED 266662 821.1mb 172.30.0.6 node-6
worker_graylog_20776 1 r STARTED 266662 821.1mb 172.30.0.11 node-11
worker_graylog_20776 2 p STARTED 265929 823mb 172.30.0.7 node-7
worker_graylog_20776 2 r STARTED 265929 823.1mb 172.30.0.11 node-11
worker_graylog_20776 2 r STARTED 265929 823.1mb 172.30.0.11 node-11
worker_graylog_20776 4 p STARTED 266175 818.1mb 172.30.0.3 node-3
worker_graylog_20776 4 r STARTED 266175 818.1mb 172.30.0.1 node-1
worker_graylog_20776 4 r STARTED 266175 818.1mb 172.30.0.6 node-6
worker_graylog_20776 0 r STARTED 265908 825.5mb 172.30.0.4 node-4
worker_graylog_20776 0 r STARTED 265908 825.5mb 172.30.0.1 node-1
worker_graylog_20776 0 p STARTED 265908 825.5mb 172.30.0.2 node-2
worker_graylog_20673 3 p STARTED 194811 906.2mb 172.30.0.11 node-11
worker_graylog_20673 3 r STARTED 194811 906.3mb 172.30.0.3 node-3
worker_graylog_20673 3 r STARTED 194811 906.3mb 172.30.0.6 node-6
worker_graylog_20673 1 p STARTED 195307 907.1mb 172.30.0.7 node-4
worker_graylog_20673 1 r STARTED 195307 907.1mb 172.30.0.7 node-7
worker_graylog_20673 1 r STARTED 195307 907.1mb 172.30.0.2 node-2
worker_graylog_20673 2 p STARTED 194406 907.7mb 172.30.0.7 node-7
worker_graylog_20673 2 r STARTED 194406 907.6mb 172.30.0.11 node-11
worker_graylog_20673 2 r STARTED 194406 907.7mb 172.30.0.11 node-11
worker_graylog_20673 4 p STARTED 194842 910.3mb 172.30.0.3 node-3
worker_graylog_20673 4 r STARTED 194842 910.4mb 172.30.0.1 node-1
worker_graylog_20673 4 r STARTED 194842 910.4mb 172.30.0.2 node-2
worker_graylog_20673 0 r STARTED 194448 904.7mb 172.30.0.4 node-4
worker_graylog_20673 0 r STARTED 194448 904.7mb 172.30.0.1 node-1
worker_graylog_20673 0 p STARTED 194448 904.7mb 172.30.0.6 node-6
DEBUG message found in ES logs, not finding anything above the debug level, but I do see a few of these. This is just an unexpected value found, which correct me if I’m wrong, the document just gets dropped and is more of an application issue to get fixed.
[DEBUG][o.e.a.b.TransportShardBulkAction] [node-6] [worker_graylog_20818][0] failed to execute bulk item (index) index {[worker_graylog_deflector][message][3bf4da85-f5aa-11eb-b085-0667c7b27054], source[n/a, actual length: [5.9kb], max length: 2kb]}
java.lang.IllegalArgumentException: mapper [ctxt_to] of different type, current_type [long], merged_type [keyword]
at org.elasticsearch.index.mapper.FieldMapper.doMerge(FieldMapper.java:354) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.index.mapper.NumberFieldMapper.doMerge(NumberFieldMapper.java:1093) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.index.mapper.FieldMapper.merge(FieldMapper.java:340) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.index.mapper.FieldMapper.merge(FieldMapper.java:52) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.index.mapper.ObjectMapper.doMerge(ObjectMapper.java:487) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.index.mapper.RootObjectMapper.doMerge(RootObjectMapper.java:278) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.index.mapper.ObjectMapper.merge(ObjectMapper.java:457) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.index.mapper.RootObjectMapper.merge(RootObjectMapper.java:273) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.index.mapper.Mapping.merge(Mapping.java:91) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.index.mapper.DocumentMapper.merge(DocumentMapper.java:339) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.applyRequest(MetaDataMappingService.java:273) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.execute(MetaDataMappingService.java:231) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:643) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:270) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:200) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:135) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) ~[elasticsearch-6.8.7.jar:6.8.7]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) ~[elasticsearch-6.8.7.jar:6.8.7]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_275]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_275]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
Corresponding WARNing messages in Graylog server.log to the above field issue, but no other warnings or errors when loading dashboards or searching
2021-08-05T00:00:29.430-07:00 WARN [Messages] Failed to index message: index=<worker_graylog_20825> id=<cebdf5d1-f5ba-11eb-b085-0667c7b27054> error=<{"type":"es_rejected_execution_exception","reason":"rejected execution of processing of [4694646448][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[worker_graylog_20825][1]] containing [308] requests, target allocation id: JD3nDaqqQfe4LRh_Fx5NAg, primary term: 1 on EsThreadPoolExecutor[name = node-4/write, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@5154cb59[Running, pool size = 16, active threads = 16, queued tasks = 203, completed tasks = 2018768090]]"}>