Cloning and Virtualizing my Graylog System

Bare with me, this is a confusing one and my knowledge of Graylog is limited.

I am still struggling to get a clone of my Graylog system running in a virtual environment so I can practice upgrading it from 2.3 to 3.0. This is my previous post about this, which is locked now due to inactivity:

Quick summary: my new virtual machine clone runs perfectly, almost too perfectly: it behaves exactly like the old server including the fact it appears to receiving 500+ messages/sec same as the old server - which is absolutely bizarre and simply is not possible

Every config file (server.conf and elasticsearch.yml) has been scoured to make sure the old IP address was replaced with the new clone IP address - there is no reason this new system should be connected to the old - I’ve even put in iptable entries so they cannot talk to each other directly. Yet the clone still appears to log all the same messages same as the original, it is very strange.

Note when I go to System/Nodes on the clone, it shows there is one active node, and it has the same code and hostname as my old node, ie: a002c9da / logs.mydomain.com , when in fact this machine is called logs-clone.mydomain.com.

I inherited this system and my knowledge is very limited, but I need to push this project forward. How do I break the connection between clone and original, can I rename the cloned node? Create a new node?

Suggestions welcomed!

did they use different mongoDB servers or do they share?

Hi Jan,

They use separate mongoDB, but one is a clone of the other - so they are exactly the same. I am assuming I may need to reconfigure mongoDB on the cloned system (maybe it is referencing the original IP address still?).

Any advice on how to proceed?

I can only guess -

Check all components, if they are connected in any way or if some config might contain the original …

Did you checked http_publish_uri? if that is the one from your original the clone will request information from the original …

Hi Jan,

My server.conf does not contain a setting called “http_publish_uri” on the clone or the original - should I set it? does it require a port number?

I have triple checked that the IP has been changed everywhere on the cloned system - I have even done a text search within every single file on the system looking for the old IP.

I do not like how if I go to “System / Nodes” on the clone it shows there is 1 active node:
a002c9da / original-system.domain.com

Can I change this somehow? This is pretty much the only place I can find the old IP or hostname referenced and I think this is where it is redirecting my web browser to the old system (remember that the clone cannot talk to the original due to firewall rules, it should not be able to send a single packet, which is why I am starting to suspect it is just my browser that is pulling info from the old system somehow)

you have pre 3.0 in the clone? check for rest_* and web_* settings

what are they?

Yes I am at v2.3, clone and original. Everything in my server.conf and elasticsearch.yml is either the correct IP address, or the loopback.

From my server.conf:

rest_listen_uri = http://0.0.0.0:12900/
rest_transport_uri = http://192.168.48.158:12900/
web_enable = true
web_listen_uri = http://127.0.0.1:9000/
http_bind_address = 192.168.48.158:9000
http_publish_uri = http://192.168.48.158:9000

Why did you have 3.0 settings when you run with 2.3?

To be honest installing a fresh Graylog would have been less painful to you …

Correction, we are on 2.4.3…you see settings which are invalid for this version?

Yes, installing Graylog fresh would have been easier, but the point of this exercise is to have an environment I can test with so I can be confident when upgrading our production environment from 2.4 to 3.0 - until I can eliminate the risk and be confident with the process, we will stay stuck at 2.4. I can’t be the first person who has tried this… am I?

Regardless, not much work has been done: converting a physical machine to virtual is pretty easy these days. After that configuring a new IP then editing server.conf and elasticsearch.yml took 5 minutes. But clearly there are additional steps required to clone a Graylog system… but what are they?

I think the biggest clue is when I go to System/Nodes on the clone and I see:

There is 1 active node
a002c9da / old-host-name.domain.com

The Memory/Heap usage bar on that page moves up and down exactly in sync with the original system. So… where is it pulling that node # and hostname from? It has to be from mongoDB - right? Are there things in mongoDB that I need to change to make this clone work independently?

Is any of this making sense? :slight_smile: Thanks again for your patience.

the http_* settings are from 3.0 …

what are all rest_* and web_* configuration settings in your setup? Does any of this contain the IP/Adress of the new origin and not the clone?

Is the MongoDB standalone? Is the clone connected to the correct MongoDB?

If you shutdown the clone, delete move /etc/graylog/server/node-id and start it again does it display the same UUID or does it show a new one?

Progress!

It didn’t like it when I removed the node-id file like you suggested: Graylog would not start up without it. Server.log showed it trying to generate a new Node ID and then failing with some error message.

However… when I put the node-id file back in place and started Graylog - boom it worked! Note I had previously edited this file and changed the ID by one digit. Graylog now shows the new ID and new hostname - when I go to System/Nodes:
b002c9da / new-host-name.domain.com

This is good… next it told me that there were no inputs, which was also good. I had to edit all my inputs one by one (Syslog UDP, Gelf TCP, etc etc) and change the bind IPs, they still had the original IP.

Then I realized my 360 indexes were no longer connected. System/Indices showed there was only 1 index totaling 700 bytes.

I then went to /elasticsearch/data where my indexes are stored. I saw a new folder had been created for the new node/hostname. I stopped elasticsearch/graylog and erased the new folder. Then renamed the old folder to the new hostname and restarted elasticsearch/graylog.

Now when I go to System/Indices it correctly states:

Default index set
360 indices, 7,181,175,572 documents, 3.6TB

My cluster is green, and everything looks good! However… I cannot do any searches, it doesn’t seem to be accessing those 360 indices properly. When I do a search in Graylog it does not come back with any results no matter how far I go back or what I search for. Logs provide no clues as to why its not connected to the database properly.

So close!!! Any suggestions? How can I troubleshoot my connection to these index files?

curl -XGET ‘http://localhost:9200/_cluster/health?pretty=true
{
“cluster_name” : “new-host-name”,
“status” : “green”,
“timed_out” : false,
“number_of_nodes” : 1,
“number_of_data_nodes” : 1,
“active_primary_shards” : 360,
“active_shards” : 360,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 0,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 100.0
}

you need to recalculate the index range - i guess.

1 Like

Success!

Thanks for your help Jan, it is appreciated. My virtualized clone is up and running with all 360 indexes available. Now I can practice breaking it! :smile:

Here’s a summary of the steps I took, maybe this will save someone else time if they ever want to virtualize their Graylog/Elastic setup:

1.	Stop Graylog / Elasticsearch
2.	Use VMware Standalone Converter - Converts physical machine to virtual
3.	Start clone with NIC disabled. Assign new IP to clone. Bring up NIC.
4.	Update IP address in /etc/hosts, /etc/graylog/server/server.conf, /etc/elasticsearch/elasticsearch.yml, /etc/mongod.conf and any other place you find it (I used grep to search every file on my harddrive for the old IP).
5.	I then attached 4TB additional virtual storage to the VM
6.	Rsync of 4TB Elastic Search Partition w/ indexes to Clone’s storage partition
7.	Rename /etc/graylog/server/node-id to node-id-old
8.	Start Graylog/Elasticsearch – watch your server.log, it will try to generate a new node ID. 
9.	If it fails to generate a new node ID like it did in my case: stop services, edit your node-id-old, change your ID slightly by one character, rename it back to node-id. Start services and your new node should be recognized. 
10.	Stop services again. Go to /elasticsearch/data where your indexes are stored. Note the new folder with the new hostname. Erase it. Rename the old folder with the new hostname.
11.	Go to System/Inputs and edit all your binds: Syslog-UDP, GELF-TCP, update the bind IP addresses in each
12.	Go to System/Indices, click your Default Index Set, Click Maintenance, Click Recalculate Index Ranges. It took about 20 minutes to complete this for my 4TB, watch the progress in server.log
13.	Complete, you should have a working virtualized clone with access to all your data.