Reindexing and replacing indexes

wrossmann · December 23, 2019, 9:26pm

While I was standing up new app clusters recently there were some issues that caused many millions of log messages to be generated in short periods of time. At the time I though “those will age out fine”, but now that the FS usage alerts have begun I realize how badly I’ve stepped in it. The organic growth in our log volume will use all the available space before the messages age out. Add to that that the messages landed in a “catch-all” index that I can’t simply blow away.

The indexes in question are no longer live, and what I would like to do is run a filtered reindex operation to build a copy without the millions of trash entries. Conventional Elasticsearch wisdom seems to be “reindex old_42 to new_42, delete old_42, alias new_42 to old_42”, but I get the feeling that something is going to go wrong when Graylog wants to rotate out old_42 which isn’t an actual index anymore.

Will the alias work, or will I need to do a second reindex back to the original name?

jan · December 24, 2019, 7:17am

he @wrossmann

you might want to delete by query - that is known to be working. The original index name need to be given. Aliases are not working.

wrossmann · December 30, 2019, 10:57pm

Thank you very much for the tip, this direction has been much more straightforward.

Here is the barebones script that I made for it:

#!/bin/bash -e

server='http://graylog1.company.ca:9200'
index=graylog_83
criteria='{
  "query": { 
    "match": {
      "source": "10.1.2.3"
    }
  }
}'

echo '= Count all docs'
curl -s -X POST "${server}/${index}/_count?pretty" 

echo '= Count target docs'
curl -s -X POST "${server}/${index}/_count?pretty" -H 'Content-Type: application/json' -d "${criteria}"

read -p "Continue? " response
if [ "${response}" != 'yes' ]; then
	exit
fi

echo '= Remove write block'
curl -s -X PUT "${server}/${index}/_settings?pretty" -H 'Content-Type: application/json' -d '{"index":{"blocks":{"write":"false"}}}'

echo '= Delete docs'
curl -s -X POST "${server}/${index}/_delete_by_query?pretty" -H 'Content-Type: application/json' -d "${criteria}"

echo '= Forcemerge'
curl -s -X POST "${server}/${index}/_forcemerge?only_expunge_deletes=true&pretty"

echo '= Replace write block'
curl -s -X PUT "${server}/${index}/_settings?pretty" -H 'Content-Type: application/json' -d '{"index":{"blocks":{"write":"true"}}}'

and I’m just fiddling with those vars as I go.

Notes/Caveats:

Versions: Graylog 2.4 and ElasticSearch 5.6 [upgrading both]
Don’t run _forcemerge against active indices.
The method Graylog uses to “close” indices may have changed in 3.x, I don’t know.
Don’t trust random people on the internet, reference the Elasticsearch docs for your version, check your own index settings, and confirm that all this works against NOT-production data first.
YMMV

system · January 13, 2020, 10:57pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Updating Elasticsearch 5.x to 6.8 Graylog Central (peer support)	7	386	January 4, 2023
ES reindex api and Graylog2 Graylog Central (peer support)	5	1979	November 1, 2017
Best practice: reindex older indices Graylog Central (peer support)	3	2594	June 3, 2019
Deleting logs from graylog/elasticsearch (a howto) Graylog Central (peer support)	1	20191	September 14, 2017
"Reprocess a stream" Graylog Central (peer support)	7	1057	August 2, 2019

Reindexing and replacing indexes

Related topics