Prod v3 - migrate old data from previous v2 - replay logs?

I used graylog v2 initially with an ova appliance …

then when v3 came out i didn’t really care about the old data - i still had it and started fresh with a new clean install via apt-get as i wanted to customize the install a little bit

so now i have split data … how can I easlily migrate the old data (about 5GB) to the now running live/prod system?

i though first about export / import … but it doesn’t seem to exist like that in the graylog gui nor with elasticsearch - but a snapshot / restore feature … (also copying elasticsearch datafiles around doesn’t seem to be suggested - possible difference in elasticsearch versions)

merging the elasticsearch instances to a cluster, which I saw as a suggested solution, seems a bit excessive … and i’m also concerned about the possible data(base) structure changes between v2 and v3 that this might create problems … i am also concerned restoring a snapshot as suggested after i already have new data running on the live / prod system that this would overwrite the db - or is this operation additive and there is no schema change that would create problems?

what i now think is the best/easiest solution is to send (replay) the old stored logs out from the old system (ex gelf formatted) to the new system.

is there already something in place to do this … possibly even from the gui or would i have to create a script to replay, send out stored log messages? …

how to easily hook into graylog via scripting - would i have to write some java to do something like this? aka write a plugin - or exists some ex. python scripting layer on the graylog-server / api?

thank you for any hints & suggestions :slight_smile:


what I read / saw are the following posts that have some relation to this - the suggestion of restoring from snapshots and creating a cluster to move data

and

and

he @ebricca

solving this kind of situation is very complex - as you build the system new and fresh you have multiple issue when you would just copy over the data. As the meta data would not be present/readable. Because the UUIDs of streams, roles and inputs are different. You might have the data available but not visible in the rght way. Or similar.

So I can’t give you a good solution at hand - to be honest only very hacky solutions came to my mind. I would personal learn from that and throw the old data away like it is lost.

The best way would have been to take at least the database (mongoDB) from the old system and do an update from the old to the new. That would have had made it possible to transfer/copy the data in. Or create a fresh installation as of the same version of Graylog&Elasticsearch&MongoDB and perform the upgrade …

@jan … thank you for the input … :smiley:

yes i can just keep the old data as is, without migration and spin the vm up, if I have to look something up …

so … if I still wanted to migrate the data … from what i understand replay would be the best way … ?
(circumventing direct data migration issues - uuid / permission / schema issues etc)

but where would i start - is the following a workable way?

iterate through all (in my case 10^7) messages on v2 via rest api if possible (via script ex python)
and send the original message with original old timestamp to the new v3 as syslog / gelf message ?

no “best way” is given.

it would be really a programming lesson to read the messages and re-ingest them as for example GELF into the system.

i have been studying the api …
is there an api reference besides using the api-browser?

iterating through all messages via rest api is where I stumble a bit (source is graylog 2.4 / ova appliance … target v3)

there is the absolute date range search feature which I could use with offset&results returned … but it also expects a search query term … which i would like to be empty or a wildcard * … but it seems like lucene doesn’t want that as search term only … though entering the relevant message origin server as search term works

ex: /api/search/universal/absolute?query=myserver&from=2014-01-23T15%3A34%3A49.000Z&to=2019-01-23T15%3A34%3A49.000Z&fields=message

is there a better api function that iterates through the db or is date range the only supposed way to access the messageid?

re-ingesting i guess to graylog v3 via api directly - is that possible? …
i saw in /api/messages/parse … but I am unsure what that function does?

it feels a little bit like recreating the “archiving” enterprise feature …
temporarily upgrading to enterprise and creating a flatfile export / reimporting would work too i think …

yes that - get enterprise license for the old, create archives and restore them - might be possible. But also with that it might be error prone when you are a heavy user of streams and right management … but try it.

I have done this; what you need to do is reindex from remote and set the stream id to the proper stream. In my setup I was tying to throw away all the old stuff as it was set up very oddly and I needed a fresh slate. If old version is using elasticsearch 5.x it should be very easy, I went from 2.x and thankfully did not have any version conflicts. Basic process:

create index set in new graylog

create a high number index manually(I did this with curl) to get active write index higher than the indice numbers you will be pulling in

create elasticsearch template with match all that has an order value of 0(higher than graylog -1 templates) and set replicas to 0(speeds up indexing) and primaries to desired number

reindex from remote(I made a script that iterates through old elasticsearch indices, check if they already exist in new and if not reindex from remote)

check logs for errors/version conflicts(helps if you run above script with nohup script & disown so you can check output in nohup file)

change data of indice to point at correct stream; this is the part that is not documented and means you do not need to move over mongodb, this is how :slight_smile: :

get the stream_id from url in a search in the new graylog by going to streams > the stream you want; it will look something like hostname:port/streams/[this is what you want]

for indice in {0..1000} //example change to match data
do

curl -XPOST hostname:9200/indicename_$indice/_update_by_query -H ‘Content-Type: application/json’ -d’
{
“script”: {
“source”: “ctx._source.streams = [“stream_id”]”,
“lang”: “painless”
},
“query”: {
“bool” : {
“must_not”: {
“term” : { “streams”:“stream_id” } //this is for performance as I was running over data a second time
}
}
}
}’

done

ctx._source.streams is the magical value, the above script will take a while but you should start to see data in new graylog fairly quickly

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.