Reindexing and replacing indexes

While I was standing up new app clusters recently there were some issues that caused many millions of log messages to be generated in short periods of time. At the time I though “those will age out fine”, but now that the FS usage alerts have begun I realize how badly I’ve stepped in it. The organic growth in our log volume will use all the available space before the messages age out. Add to that that the messages landed in a “catch-all” index that I can’t simply blow away.

The indexes in question are no longer live, and what I would like to do is run a filtered reindex operation to build a copy without the millions of trash entries. Conventional Elasticsearch wisdom seems to be “reindex old_42 to new_42, delete old_42, alias new_42 to old_42”, but I get the feeling that something is going to go wrong when Graylog wants to rotate out old_42 which isn’t an actual index anymore.

Will the alias work, or will I need to do a second reindex back to the original name?

he @wrossmann

you might want to delete by query - that is known to be working. The original index name need to be given. Aliases are not working.

Thank you very much for the tip, this direction has been much more straightforward.

Here is the barebones script that I made for it:

#!/bin/bash -e

server='http://graylog1.company.ca:9200'
index=graylog_83
criteria='{
  "query": { 
    "match": {
      "source": "10.1.2.3"
    }
  }
}'

echo '= Count all docs'
curl -s -X POST "${server}/${index}/_count?pretty" 

echo '= Count target docs'
curl -s -X POST "${server}/${index}/_count?pretty" -H 'Content-Type: application/json' -d "${criteria}"

read -p "Continue? " response
if [ "${response}" != 'yes' ]; then
	exit
fi

echo '= Remove write block'
curl -s -X PUT "${server}/${index}/_settings?pretty" -H 'Content-Type: application/json' -d '{"index":{"blocks":{"write":"false"}}}'

echo '= Delete docs'
curl -s -X POST "${server}/${index}/_delete_by_query?pretty" -H 'Content-Type: application/json' -d "${criteria}"

echo '= Forcemerge'
curl -s -X POST "${server}/${index}/_forcemerge?only_expunge_deletes=true&pretty"

echo '= Replace write block'
curl -s -X PUT "${server}/${index}/_settings?pretty" -H 'Content-Type: application/json' -d '{"index":{"blocks":{"write":"true"}}}'

and I’m just fiddling with those vars as I go.

Notes/Caveats:

  • Versions: Graylog 2.4 and ElasticSearch 5.6 [upgrading both]
  • Don’t run _forcemerge against active indices.
  • The method Graylog uses to “close” indices may have changed in 3.x, I don’t know.
  • Don’t trust random people on the internet, reference the Elasticsearch docs for your version, check your own index settings, and confirm that all this works against NOT-production data first.
  • YMMV

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.