One mans journey on correcting a elasticsearch datatype for Graylog
FYE (For Your Edification):
Before you read any further, know that I am very new to all of this graylog/elastic stuff. This process worked for me but there may be underlying problems I haven’t noticed… and of course your mileage may vary depending on your configuration/replication/nodes and for that matter what versions of things you are using.
Current relevant versions:
Graylog: 3.3.4
Elastic 6.8.10
Scenario: I had a field called login_duration
that tracks the duration of each time an employee connects to one of our VPN’s (in seconds… but that’s another story) and Elastic decided that it would be a keyword rather than long …which meant I could not sum up the total time connected over X days.
What to do?: Create a custom index mapping of course! Just follow this and it will all just work! If you aren’t familiar with it… stop right here and read through it. This saga assumes you have.
But… I have more than one indices in my index set and the current/old ones will retain login_duration
as a keyword. I could delete all the past indices… but that is not particularly helpful.
Solution: I will create the custom mapping AND I will reindex the indices that have the keyword. Hold onto your hat, I am going to step through my process here.
Caveat: I have only one Elastic database to work on and this is not a production system so I am not getting into backing up before you do any of this. You should have read the docs on custom index mapping and the warnings saying you should think through what you are doing!
The work: First I needed to see where login_duration
was a keyword particularly since I have two VPN’s that initially work off different indices.
curl -X GET -netrc "elstc-main:9200/*/_mapping/field/login_duration?pretty" | grep -B 7 keyword
While I am not going to parse out my command lines for you, I do want you to note that I am using -netrc in my curl. You should have your elastic database locked down so it can’t be randomly queried by anyone. Research netrc. It’s not a perfect solution but in my case it is sufficient.
some results:
--
"rem_ac_10" : {
"mappings" : {
"message" : {
"login_duration" : {
"full_name" : "login_duration",
"mapping" : {
"login_duration" : {
"type" : "keyword"
--
"graylog_45" : {
"mappings" : {
"message" : {
"login_duration" : {
"full_name" : "login_duration",
"mapping" : {
"login_duration" : {
"type" : "keyword"
Oh man… I have two indices to contend with. graylog_*
and rem_ac_*
for brevity in the face of verboseness we will work on rem_ac_*
For the custom mapping I am creating the following json for rem_ac_*, naming the file “gl_custom_rem_ac.json” and storing it in a safe and relevant directory in case I want to remove this custom mapping in the future (see docs for that) The name doesn’t matter but consistency does!
gl_custom_rem_ac.json:
{
"template": "rem_ac_*",
"mappings" : {
"message" : {
"properties" : {
"login_duration" : {
"type" : "long"
}
}
}
}
}
To put it into elastic:
curl -X PUT --netrc -H 'Content-Type: application/json' -d @'gl_custom_rem_ac.json' 'http://elstc-main:9200/_template/gl_custom_rem_ac?pretty'
NOTE: In the docs it shows the following as if it part of the command… it’s not… this is what you get as a result if your command was successful. If you have a typo in your json it will usually whinge… or blow up your entire elastic database as forever corrupted… kidding.
{
"acknowledged" : true
}
NOTE-TWO: To do the graylog_*
index you would create a wholly separate json file changing rem_ac_*
for graylog_*
and separately curl-ing it into elastic. Deleting would be separate too. One file per index set.
Once you have done this, the indices you have set custom mappings for will have to be rotated go to http://gray-dude:9000/system/indices, click on the indix you are working on and under maintenance button menu in the upper right select “Rotate Active Write Index”.
Did that go smoothly? GREAT! Now for the meatier portion that Graylog docs doesn’t talk about. Beyond this point you are messing directly with the elastic database. Graylog likely won’t support you in this and if you have questions or issues… for goodness sake ask/post them at Elastic… and don’t ask me, I am a tadpole.
You have been warned.
Lets look at the rem_ac_*
indices… I am interested in the docs.count in my index.
curl -X GET --netrc "elstc-main:9200/_cat/indices/rem*?v&s=index&pretty"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open rem_ac_0 ADTmcOFaSQeLJEYgGL7VtQ 4 0 73579 0 80.5mb 80.5mb
green open rem_ac_1 1ZL_wg17RMWIaCG-0mqIJA 4 0 15847 0 16.4mb 16.4mb
green open rem_ac_10 u3LBFk_hQJyoR1H5DzKy-g 4 0 17573 0 18.2mb 18.2mb
...
rem_ac_10
showed up in my initial search for login_duration
being a keyword so lets adjust that index. First I am going to pull the current mapping out to a text file:
curl -X GET --netrc "elst-main:9200/rem_ac_10/_mapping?pretty" > rem_mapping
then I edit (vi rem_mapping … you like vi don’t you???) that file doing THREE things to set it as a correct mapping template:
- Delete the second line in the file that is specifying the index name
- Delete the second to last line in the file which is just the corresponding ‘}’
- Find and modify the
login_duration
field definition from keyword to long.
You are getting board. I can tell. I am too. So I will shorten up some descriptions of what is going on.
—Create new blank temp index using the modified mapping template.
curl -XPUT --netrc -H 'Content-Type: application/json' http://elst-main:9200/rem_ac_001 -d @rem_mapping
— Copy to temp index
curl -X POST --netrc "elst-main:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "rem_ac_10"
},
"dest": {
"index": "rem_ac_001",
"version_type": "internal"
}
}
'
— Verify docs.count are the same between rem_ac_10 and rem_ac_001
curl -X GET --netrc "elst-main:9200/_cat/indices/rem*?v&s=index&pretty"
— Delete old index if you are absolutely sure the reindex to 001 worked…
curl -X DELETE --netrc "elst-main:9200/rem_ac_10"
—Create new index with original name using the modified mapping template.
curl -XPUT --netrc -H 'Content-Type: application/json' http://elst-main:9200/rem_ac_10 -d @rem_mapping
— copy temp to old index name
curl -X POST --netrc "elst-main:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "rem_ac_001"
},
"dest": {
"index": "rem_ac_10",
"version_type": "internal"
}
}
'
— verify docs.count again
curl -X GET --netrc "elst-main:9200/_cat/indices/rem*?v&s=index&pretty"
— delete the temp index
curl -X DELETE --netrc "elst-main:9200/rem_ac_001"
After I did this I noticed that my index was yellow… elastic accounts for your reindex as a replica…except we deleted that… which elastic doesn’t register so let’s set it back to zero.
— Set number of replicas to zero and verify
curl -XPUT --netrc -H 'Content-Type: application/json' http:/elst-main:9200/rem_ac_10/_settings?pretty -d '{"number_of_replicas":0}'
curl -X GET --netrc "elst-main:9200/_cat/indices/rem*?v&s=index&pretty"
— Also Check for Keyword again to make sure it went right
curl -X GET -netrc "elst-main:9200/rem*/_mapping/field/login_duration?pretty" | grep -B 7 keyword
That was it for me. Good luck!
PS: Hoping that someone with more knowledge doesn’t come along and say OMG NEVER DO THAT!!