Messages not shown after reindex

Hello everyone
I have done a bad thing: pushed all the logs from windows and linux to the same index and hit the 1000 fields limit in ES.
So I made separate indices for windows, linux and metricbeat and set up routing. The original index remains for windows. New messages work fine and I tried reindexing older linux messages to its own index. ES processed it just fine, however they don’t show up in graylog. I have tried recalculating index ranges, but it seems to do nothing. Index detail shows 222k messages, but results show only 14k messages (stored since reindex).

Graylog 4.0.8 on Debian 10 (Graylog itself shows Debian 11, but that’s not true), ES 6.8.16, mongo 4.2.14

Hello & Welcome

Maybe I can help.

I bet you wont do that again :slight_smile:
There might be something here that will help you about reindexing/Elasticsearch

Not sure all what you did or your configuration made but I know its about the communication between Graylog and elasicsearch. Do you see anything in the log files about this issue?
Hope that helps

I created the index via graylog itself, so it knows about it. Messages indexed into it via graylog are found and shown, however older messages (re)indexed manually via ES api are not found and shown.
I don’t see any logs when searching the index or recalculating ranges.

Can you tried a couple of command on your Elasticsearch?
The following commands are for LOCALHOST, if your ES config file is different replace localhost with the correct address.

Not sure if you have done this but it might help identify whats going on.

ES Health Check
curl -XGET http://localhost:9200/_cluster/health?pretty=true

Check Shards
curl -XGET http://localhost:9200/_cat/shards

ES Shard Info
curl -XGET http://localhost:9200/_cluster/allocation/explain?pretty

ES List Indices
curl -X GET 'http://localhost:9200/_cat/indices?v'

Or maybe something in here might be able to help you troubleshoot.

1 Like

ES Health Check

output
{
    "cluster_name": "graylog",
    "status": "green",
    "timed_out": false,
    "number_of_nodes": 1,
    "number_of_data_nodes": 1,
    "active_primary_shards": 72,
    "active_shards": 72,
    "relocating_shards": 0,
    "initializing_shards": 0,
    "unassigned_shards": 0,
    "delayed_unassigned_shards": 0,
    "number_of_pending_tasks": 0,
    "number_of_in_flight_fetch": 0,
    "task_max_waiting_in_queue_millis": 0,
    "active_shards_percent_as_number": 100.0
}

Check Shards

output
gl-events_19        1 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-events_19        3 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-events_19        2 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-events_19        0 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-system-events_21 1 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-system-events_21 2 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-system-events_21 3 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-system-events_21 0 p STARTED       0    261b 127.0.0.1 Scmw-HV
sgt-metrics_0       1 p STARTED  236716 144.9mb 127.0.0.1 Scmw-HV
sgt-metrics_0       3 p STARTED  236413 144.6mb 127.0.0.1 Scmw-HV
sgt-metrics_0       2 p STARTED  238112 145.5mb 127.0.0.1 Scmw-HV
sgt-metrics_0       0 p STARTED  238510 146.1mb 127.0.0.1 Scmw-HV
webbox_1            1 p STARTED 4999724   2.2gb 127.0.0.1 Scmw-HV
webbox_1            3 p STARTED 4999812   2.2gb 127.0.0.1 Scmw-HV
webbox_1            2 p STARTED 4999899   2.2gb 127.0.0.1 Scmw-HV
webbox_1            0 p STARTED 5001010   2.2gb 127.0.0.1 Scmw-HV
sgt-linux_0         1 p STARTED   62611  36.9mb 127.0.0.1 Scmw-HV
sgt-linux_0         2 p STARTED   62615  35.4mb 127.0.0.1 Scmw-HV
sgt-linux_0         3 p STARTED   62340  35.3mb 127.0.0.1 Scmw-HV
sgt-linux_0         0 p STARTED   62811  35.3mb 127.0.0.1 Scmw-HV
sgt__1              1 p STARTED  132286  77.8mb 127.0.0.1 Scmw-HV
sgt__1              2 p STARTED  132394  77.5mb 127.0.0.1 Scmw-HV
sgt__1              3 p STARTED  132794  77.6mb 127.0.0.1 Scmw-HV
sgt__1              0 p STARTED  132926    78mb 127.0.0.1 Scmw-HV
gl-system-events_20 1 p STARTED       0    230b 127.0.0.1 Scmw-HV
gl-system-events_20 3 p STARTED       0    230b 127.0.0.1 Scmw-HV
gl-system-events_20 2 p STARTED       0    230b 127.0.0.1 Scmw-HV
gl-system-events_20 0 p STARTED       0    230b 127.0.0.1 Scmw-HV
webbox__0           1 p STARTED 4998551   1.6gb 127.0.0.1 Scmw-HV
webbox__0           3 p STARTED 4998956   1.6gb 127.0.0.1 Scmw-HV
webbox__0           2 p STARTED 5002882   1.6gb 127.0.0.1 Scmw-HV
webbox__0           0 p STARTED 4999638   1.6gb 127.0.0.1 Scmw-HV
wb2__0              1 p STARTED      48 122.9kb 127.0.0.1 Scmw-HV
wb2__0              3 p STARTED      59 209.5kb 127.0.0.1 Scmw-HV
wb2__0              2 p STARTED      41 120.5kb 127.0.0.1 Scmw-HV
wb2__0              0 p STARTED      52 183.2kb 127.0.0.1 Scmw-HV
gl-events_18        1 p STARTED       3   9.3kb 127.0.0.1 Scmw-HV
gl-events_18        3 p STARTED       2   8.5kb 127.0.0.1 Scmw-HV
gl-events_18        2 p STARTED       5  10.9kb 127.0.0.1 Scmw-HV
gl-events_18        0 p STARTED       5  11.2kb 127.0.0.1 Scmw-HV
sgt__0              1 p STARTED 1843105   1.6gb 127.0.0.1 Scmw-HV
sgt__0              2 p STARTED 1839851   1.6gb 127.0.0.1 Scmw-HV
sgt__0              3 p STARTED 1840025   1.6gb 127.0.0.1 Scmw-HV
sgt__0              0 p STARTED 1840398   1.6gb 127.0.0.1 Scmw-HV
gl-events_21        1 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-events_21        2 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-events_21        3 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-events_21        0 p STARTED       0    261b 127.0.0.1 Scmw-HV
webbox_0            1 p STARTED 5000119     2gb 127.0.0.1 Scmw-HV
webbox_0            3 p STARTED 5000415     2gb 127.0.0.1 Scmw-HV
webbox_0            2 p STARTED 4997033     2gb 127.0.0.1 Scmw-HV
webbox_0            0 p STARTED 5002636     2gb 127.0.0.1 Scmw-HV
gl-system-events_18 1 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-system-events_18 2 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-system-events_18 3 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-system-events_18 0 p STARTED       0    261b 127.0.0.1 Scmw-HV
webbox__1           1 p STARTED 1203937 445.9mb 127.0.0.1 Scmw-HV
webbox__1           2 p STARTED 1202448 445.3mb 127.0.0.1 Scmw-HV
webbox__1           3 p STARTED 1204902 448.2mb 127.0.0.1 Scmw-HV
webbox__1           0 p STARTED 1203242 446.4mb 127.0.0.1 Scmw-HV
graylog_0           1 p STARTED 1341830   1.4gb 127.0.0.1 Scmw-HV
graylog_0           3 p STARTED 1342102   1.4gb 127.0.0.1 Scmw-HV
graylog_0           2 p STARTED 1341283   1.4gb 127.0.0.1 Scmw-HV
graylog_0           0 p STARTED 1342823   1.4gb 127.0.0.1 Scmw-HV
gl-events_20        1 p STARTED       0    230b 127.0.0.1 Scmw-HV
gl-events_20        2 p STARTED       0    230b 127.0.0.1 Scmw-HV
gl-events_20        3 p STARTED       0    230b 127.0.0.1 Scmw-HV
gl-events_20        0 p STARTED       0    230b 127.0.0.1 Scmw-HV
gl-system-events_19 1 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-system-events_19 3 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-system-events_19 2 p STARTED       0    261b 127.0.0.1 Scmw-HV
gl-system-events_19 0 p STARTED       0    261b 127.0.0.1 Scmw-HV

ES Shard Info

output
{
    "error": {
        "root_cause": [
            {
                "type": "illegal_argument_exception",
                "reason": "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
    },
    "status": 400
}

ES List Indices

output
health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   gl-events_11        JsuasY0oT4KA4mC50QAsMQ   4   0          2            0     17.9kb         17.9kb
green  open   gl-system-events_19 T5QM2zNsRd2ww_ktItN2Hw   4   0          0            0        1kb            1kb
green  open   gl-system-events_14 OTNr9pz-SguqCozjSo7m9A   4   0          0            0        1kb            1kb
green  open   webbox_1            RJ7_3D8PQgKUwL4pYMv_7w   4   0   20000445            0      8.8gb          8.8gb
green  open   gl-events_12        kahttUWyRzeRJX1_SgC4Dg   4   0          0            0        1kb            1kb
green  open   gl-events_18        BK-WFN_FQ7qpOJuhS1ahag   4   0         15            0     40.1kb         40.1kb
green  open   gl-events_16        UvUbCIxdS9acO14Do4kKhw   4   0          0            0        1kb            1kb
green  open   gl-events_21        -IVVw7ZER1q1UYMrmDxskQ   4   0          0            0        1kb            1kb
green  open   webbox__1           MQSJRrBcSC6UXlrLYbzp3w   4   0    4814389            0      1.7gb          1.7gb
green  open   gl-events_10        nJVDx0kkQbawl1Do25E4TQ   4   0          1            0      9.4kb          9.4kb
green  open   sgt-linux_0         Cx84pdHhR5O2RzIzrEIk9w   4   0     250377            6    143.1mb        143.1mb
green  open   gl-system-events_15 B-G57PUWSbCEqsJ-DRoR-g   4   0          0            0        1kb            1kb
green  open   graylog_0           oRJ-90o1TIaFY3p_DfaDEQ   4   0    5368038            0      5.6gb          5.6gb
green  open   gl-system-events_17 PWcuhS4CQGq4hO5VFXa6uQ   4   0          0            0        1kb            1kb
green  open   gl-system-events_10 DA2lydduRDC8cw4ZTyRpYA   4   0          0            0        1kb            1kb
green  open   gl-events_19        2ZQZZHWVScipu_bw_gRiGQ   4   0          0            0        1kb            1kb
green  open   gl-events_20        7NuoC1lxQeCEjzNc_D88Bw   4   0          0            0       920b           920b
green  open   gl-system-events_16 BfFSc-d8S0OvCp76LPwWqA   4   0          0            0        1kb            1kb
green  open   gl-events_14        djgmkXzCRfmQn_vhR4X1SA   4   0          0            0        1kb            1kb
green  open   gl-events_15        k4ZMDR2MQwScYDvxSdup5A   4   0          0            0        1kb            1kb
green  open   gl-system-events_18 LRU6IAfoSuC2J8OLWV1znQ   4   0          0            0        1kb            1kb
green  open   sgt__1              zoHPZTdbQsWDs4opag1A_Q   4   0     530227            0    310.9mb        310.9mb
green  open   webbox_0            x1yBam6JQ7SQNsT6JuRlxg   4   0   20000203            0      8.3gb          8.3gb
green  open   gl-events_17        wi-C1sv0Q9mY1cfBgygIHQ   4   0          0            0        1kb            1kb
green  open   gl-system-events_13 0jk_N9nlTZ-ux8B-PXsbQA   4   0          0            0        1kb            1kb
green  open   gl-system-events_11 8W8vT9vLQwaUqCasWRMFrA   4   0          0            0        1kb            1kb
green  open   wb2__0              YQ000ligQqGBqSkQETCpew   4   0        200            0    636.2kb        636.2kb
green  open   gl-system-events_21 XYNzEHnwTEiS1VV5jUcQtw   4   0          0            0        1kb            1kb
green  open   sgt__0              IYqFODXrTl67JGAskNXUMA   4   0    7363379            0      6.6gb          6.6gb
green  open   gl-system-events_20 eelxa_fZR0SG1j_Yv6zRkA   4   0          0            0       920b           920b
green  open   gl-system-events_12 L1pHpMPCRzifriSSHAFuxQ   4   0          0            0        1kb            1kb
green  open   gl-events_13        V9Nd0EdTTP-pV_I6QOm-LQ   4   0         35            0     44.6kb         44.6kb
green  open   webbox__0           IjbuABNQRZOzRvcDCcvmTg   4   0   20000027            0      6.7gb          6.7gb
green  open   sgt-metrics_0       7AYFY1_jSE26OwBQX6sRgQ   4   0     949696            0    581.4mb        581.4mb

Hello,
Thanks for the added info. I really didnt notice anything that stuck out that could be wrong. It actually looks good.

As for you

I never had your issue before so its a little unclear how to recover old message that were combined on the same index then separted. Really dont know what to tell you. Maybe someone else here might have an idea how to retrieve those older messages.

What is your index retention configuration set as? If set on delete/close maybe this may have happed when you recalculating index. Have you also tried rotating active write index?

I think the recalculate index ranges function/button does not work.
according to the source code, there should be something in the log (I have logging set to “info”), but there is nothing.

Edit:
the index ranges of most indices (including this one) are begin:0, end:0. It seems to me that active write index does not use index ranges (I think the mentioned code also skips them)

the index is set to hold 20M messages (default setting), max 20 indices and then delete. It has not rotated yet, so there is only one index with 4 shards.

okay, so I have solved the “mystery”
Graylog uses stream id in the search query and since the old index had different id, the messages from it were not returned by ES.
So I updated all messages in the index to have the new stream id and graylog shows them.

POST /sgt-linux_0/_update_by_query
{
  "script": {
    "inline": "ctx._source.streams = params.newProp",
    "params": {
      "newProp": [
        "60ce2627b78911459056cb42"
      ]
    }
  },
  "query": {
    "match_all": {}
  }
}
2 Likes

Nice,
When executing a Elasticsearch dump and restore I had to follow @aaronsachs advice here.

Glad you worked it out

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.