Re-index after the data mapping is changed

I realize that Graylog does not support re-index out of box.!topic/graylog2/wFtIYB5QVT8

I have found a method that seems to be working, and would like to get feedback from the Graylog folks if there are any potential issues with the approach.

If not, the steps below could help others who need to reindex after changing data mapping.

  1. Go System | Indices | the index in question (in this example 7d)

  2. Identify the lowest / highest indexes in current window (in this case 7d_204, 7d_210)
    (in this example, we’re re-indexing the lowest entry and moving upwords)

  3. Temporarily increase the number of indices that are kept in rotation so that while creating a copy index, the last index is not deleted.
    (You can lower this back after you are done)

  4. Use this as template to copy the lowest index to highest index + 1
    curl -XPOST ‘localhost:9200/_reindex?pretty’ -H ‘Content-Type: application/json’ -d’
    “source”: {
    “index”: “7d_204”
    “dest”: {
    “index”: “7d_211”

  5. Run the script

  6. Verify to make sure the destination index message count matches or is greater than the source index message count.

  7. Delete the older index after confirming that the messages from that index are appearing twice.

  8. Repeat & Rinse for the next index (7d_205 mapped to 7d_212) , …

  9. Revert the changes made in step 3.

Following up on this.
Is it possible to temporarily disable Graylog Deflector automatically Re-pointing to the newest index (at least until it is fully re-created)

Seeing this in the logs.
2017-10-16_16:02:58.62104 WARN [IndexRotationThread] Deflector is pointing to [graylog_495], not the newest one: [graylog_496]. Re-pointing.

We had assumed that new messages continue being written to the current deflector pointing index
However we’re not sure anymore.

After Indexing is completed.
If we query Graylog for data in the index

We noticed messages stopped arriving until the index writing was completed.
As seen the re-pointing occurred at 16:02:58
The first new message seen is stamped 16:42:36 (excluding the older data).

An interesting observation is that The original index was 7.7GB, with 3,853.939 messages
After indexing was completed, the new index is 14.7 GB with 4,052,139 messages
The increase in messages had made us assume that the messages kept on arriving.
Though almost double of the disk usage appears excessive and the gap of messages along with some custom alerts raised due to messages not arriving.

It looks like the messages were arriving but were queued and not processed.
Waited a while and the messages starting 16:02:51 started showing up.

So the original concern is valid.
Is it possible to temporarily disable Graylog Deflector automatically Re-pointing to the newest index.


Another update.

No need to worry about temporarily disabling Deflector change.

  • Make sure new index creation won’t delete another index (step #3 above)
  • re-create the new index one less than the lowest index count

No more issues of messages not arriving.
You get the exact same message count as the original index
Interestingly I still got more disk usage on the new index

6.2GB original on 2,592,687 messages
10.2GB on the new index (same count)

… Later on,(don’t know how and why) it read 7.3GB

Could there be some kind of background optimization going on? or is it because some field types changed and hence are consuming more space?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.