Custom Mappings and Historical Correction

I posted this a while ago but it is buried in the forums - Here it is in an easier to find place:

One mans journey on correcting a elasticsearch datatype for Graylog
FYE (For Your Edification):

Before you read any further, know that I am very new to all of this graylog/elastic stuff. This process worked for me but there may be underlying problems I haven’t noticed… and of course your mileage may vary depending on your configuration/replication/nodes and for that matter what versions of things you are using.

Current relevant versions:
Graylog: 3.3.4
Elastic 6.8.10

Scenario: I had a field called login_duration that tracks the duration of each time an employee connects to one of our VPN’s (in seconds… but that’s another story) and Elastic decided that it would be a keyword rather than long …which meant I could not sum up the total time connected over X days.

What to do?: Create a custom index mapping of course! Just follow this and it will all just work! If you aren’t familiar with it… stop right here and read through it. This saga assumes you have.

But… I have more than one indices in my index set and the current/old ones will retain login_duration as a keyword . I could delete all the past indices… but that is not particularly helpful.

Solution: I will create the custom mapping AND I will reindex the indices that have the keyword. Hold onto your hat, I am going to step through my process here.

Caveat: I have only one Elastic database to work on and this is not a production system so I am not getting into backing up before you do any of this. You should have read the docs on custom index mapping and the warnings saying you should think through what you are doing!

The work: First I needed to see where login_duration was a keyword particularly since I have two VPN’s that initially work off different indices.

curl -X GET -netrc "elstc-main:9200/*/_mapping/field/login_duration?pretty" | grep -B 7 keyword

While I am not going to parse out my command lines for you, I do want you to note that I am using -netrc in my curl. You should have your elastic database locked down so it can’t be randomly queried by anyone. Research netrc. It’s not a perfect solution but in my case it is sufficient.

some results:

--
  "rem_ac_10" : {
    "mappings" : {
      "message" : {
        "login_duration" : {
          "full_name" : "login_duration",
          "mapping" : {
            "login_duration" : {
              "type" : "keyword"
--
  "graylog_45" : {
    "mappings" : {
      "message" : {
        "login_duration" : {
          "full_name" : "login_duration",
          "mapping" : {
            "login_duration" : {
              "type" : "keyword"

Oh man… I have two indices to contend with. graylog_* and rem_ac_* for brevity in the face of verboseness we will work on rem_ac_*

For the custom mapping I am creating the following json for rem_ac_*, naming the file “ gl_custom_rem_ac.json ” and storing it in a safe and relevant directory in case I want to remove this custom mapping in the future (see docs for that) The name doesn’t matter but consistency does!

gl_custom_rem_ac.json :

{
  "template": "rem_ac_*",
  "mappings" : {
    "message" : {
      "properties" : {
        "login_duration" : {
          "type" : "long"
        }
      }
    }
  }
}

To put it into elastic:

curl -X PUT --netrc -H 'Content-Type: application/json' -d @'gl_custom_rem_ac.json' 'http://elstc-main:9200/_template/gl_custom_rem_ac?pretty'

NOTE : In the docs it shows the following as if it part of the command… it’s not… this is what you get as a result if your command was successful. If you have a typo in your json it will usually whinge… or blow up your entire elastic database as forever corrupted… kidding.

{
    "acknowledged" : true
}

NOTE-TWO: To do the graylog_* index you would create a wholly separate json file changing rem_ac_* for graylog_* and separately curl-ing it into elastic. Deleting would be separate too. One file per index set.

Once you have done this, the indices you have set custom mappings for will have to be rotated go to http://gray-dude:9000/system/indices , click on the index you are working on and under maintenance button menu in the upper right select “ Rotate Active Write Index ”.

Did that go smoothly? GREAT! Now for the meatier portion that Graylog docs doesn’t talk about. Beyond this point you are messing directly with the elastic database. Graylog likely won’t support you in this and if you have questions or issues… for goodness sake ask/post them at Elastic… and don’t ask me, I am a tadpole.

WAIT!! One more thing… before you work on an index and start creating new indexes make sure that you examine how that index is managed. If it rotates based on the count of indexes and you add another index or two it will delete the old indexes that you may want. Increase the rotation count to a bigger number until you are done.

You have been warned.

Lets look at the rem_ac_* indices… I am interested in the docs.count in my index.

curl -X GET --netrc "elstc-main:9200/_cat/indices/rem*?v&s=index&pretty"

health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   rem_ac_0  ADTmcOFaSQeLJEYgGL7VtQ   4   0      73579            0     80.5mb         80.5mb
green  open   rem_ac_1  1ZL_wg17RMWIaCG-0mqIJA   4   0      15847            0     16.4mb         16.4mb
green  open   rem_ac_10 u3LBFk_hQJyoR1H5DzKy-g   4   0      17573            0     18.2mb         18.2mb
    ...

rem_ac_10 showed up in my initial search for login_duration being a keyword so lets adjust that index. First I am going to pull the current mapping out to a text file:

curl -X GET --netrc "elst-main:9200/rem_ac_10/_mapping?pretty" > rem_mapping

then I edit (vi rem_mapping … you like vi don’t you???) that file doing THREE things to set it as a correct mapping template:

  1. Delete the second line in the file that is specifying the index name
  2. Delete the second to last line in the file which is just the corresponding ‘}’
  3. Find and modify the login_duration field definition from keyword to long .

You are getting board. I can tell. I am too. So I will shorten up some descriptions of what is going on.

—Create new blank temp index using the modified mapping template.
curl -XPUT --netrc -H 'Content-Type: application/json' http://elst-main:9200/rem_ac_001 -d @rem_mapping

— Copy to temp index

curl -X POST --netrc "elst-main:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
 {
   "source": {
      "index": "rem_ac_10"
    },
    "dest": {
      "index": "rem_ac_001",
      "version_type": "internal"
    }
 }

'

— Verify docs.count are the same between rem_ac_10 and rem_ac_001
curl -X GET --netrc "elst-main:9200/_cat/indices/rem*?v&s=index&pretty"

— Delete old index if you are absolutely sure the reindex to 001 worked…
curl -X DELETE --netrc "elst-main:9200/rem_ac_10"

—Create new index with original name using the modified mapping template.
curl -XPUT --netrc -H 'Content-Type: application/json' http://elst-main:9200/rem_ac_10 -d @rem_mapping

— copy temp to old index name

 curl -X POST --netrc "elst-main:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
 {
   "source": {
      "index": "rem_ac_001"
    },
    "dest": {
      "index": "rem_ac_10",
      "version_type": "internal"
    }
 }
'

— verify docs.count again
curl -X GET --netrc "elst-main:9200/_cat/indices/rem*?v&s=index&pretty"

— delete the temp index
curl -X DELETE --netrc "elst-main:9200/rem_ac_001"

After I did this I noticed that my index was yellow… elastic accounts for your reindex as a replica…except we deleted that… which elastic doesn’t register so let’s set it back to zero.

— Set number of replicas to zero and verify

curl -XPUT --netrc -H 'Content-Type: application/json' http:/elst-main:9200/rem_ac_10/_settings?pretty -d '{"number_of_replicas":0}'

curl -X GET --netrc "elst-main:9200/_cat/indices/rem*?v&s=index&pretty"

— Also Check for Keyword again to make sure it went right
curl -X GET -netrc "elst-main:9200/rem*/_mapping/field/login_duration?pretty" | grep -B 7 keyword

That was it for me. Good luck!

PS: Hoping that someone with more knowledge doesn’t come along and say OMG NEVER DO THAT!! :stuck_out_tongue:

3 Likes

@tmacgbay
Nice, and thanks for sharing. TBH I’m using it now :slight_smile:

1 Like

I know custom mapping and I often use it but I wasn’t aware of the reindexing trick, thank you for sharing.