Cloning and Virtualizing my Graylog System

danny999 · July 4, 2019, 2:16pm

Bare with me, this is a confusing one and my knowledge of Graylog is limited.

I am still struggling to get a clone of my Graylog system running in a virtual environment so I can practice upgrading it from 2.3 to 3.0. This is my previous post about this, which is locked now due to inactivity:

Quick summary: my new virtual machine clone runs perfectly, almost too perfectly: it behaves exactly like the old server including the fact it appears to receiving 500+ messages/sec same as the old server - which is absolutely bizarre and simply is not possible

Every config file (server.conf and elasticsearch.yml) has been scoured to make sure the old IP address was replaced with the new clone IP address - there is no reason this new system should be connected to the old - I’ve even put in iptable entries so they cannot talk to each other directly. Yet the clone still appears to log all the same messages same as the original, it is very strange.

Note when I go to System/Nodes on the clone, it shows there is one active node, and it has the same code and hostname as my old node, ie: a002c9da / logs.mydomain.com , when in fact this machine is called logs-clone.mydomain.com.

I inherited this system and my knowledge is very limited, but I need to push this project forward. How do I break the connection between clone and original, can I rename the cloned node? Create a new node?

Suggestions welcomed!

jan · July 5, 2019, 3:05pm

did they use different mongoDB servers or do they share?

danny999 · July 5, 2019, 3:24pm

Hi Jan,

They use separate mongoDB, but one is a clone of the other - so they are exactly the same. I am assuming I may need to reconfigure mongoDB on the cloned system (maybe it is referencing the original IP address still?).

Any advice on how to proceed?

jan · July 5, 2019, 4:40pm

I can only guess -

Check all components, if they are connected in any way or if some config might contain the original …

Did you checked http_publish_uri? if that is the one from your original the clone will request information from the original …

danny999 · July 5, 2019, 5:58pm

Hi Jan,

My server.conf does not contain a setting called “http_publish_uri” on the clone or the original - should I set it? does it require a port number?

I have triple checked that the IP has been changed everywhere on the cloned system - I have even done a text search within every single file on the system looking for the old IP.

I do not like how if I go to “System / Nodes” on the clone it shows there is 1 active node:
a002c9da / original-system.domain.com

Can I change this somehow? This is pretty much the only place I can find the old IP or hostname referenced and I think this is where it is redirecting my web browser to the old system (remember that the clone cannot talk to the original due to firewall rules, it should not be able to send a single packet, which is why I am starting to suspect it is just my browser that is pulling info from the old system somehow)

jan · July 5, 2019, 6:03pm

you have pre 3.0 in the clone? check for rest_* and web_* settings

what are they?

danny999 · July 10, 2019, 12:47pm

Yes I am at v2.3, clone and original. Everything in my server.conf and elasticsearch.yml is either the correct IP address, or the loopback.

From my server.conf:

rest_listen_uri = http://0.0.0.0:12900/
rest_transport_uri = http://192.168.48.158:12900/
web_enable = true
web_listen_uri = http://127.0.0.1:9000/
http_bind_address = 192.168.48.158:9000
http_publish_uri = http://192.168.48.158:9000

jan · July 11, 2019, 5:48am

Why did you have 3.0 settings when you run with 2.3?

To be honest installing a fresh Graylog would have been less painful to you …

danny999 · July 11, 2019, 12:45pm

Correction, we are on 2.4.3…you see settings which are invalid for this version?

Yes, installing Graylog fresh would have been easier, but the point of this exercise is to have an environment I can test with so I can be confident when upgrading our production environment from 2.4 to 3.0 - until I can eliminate the risk and be confident with the process, we will stay stuck at 2.4. I can’t be the first person who has tried this… am I?

Regardless, not much work has been done: converting a physical machine to virtual is pretty easy these days. After that configuring a new IP then editing server.conf and elasticsearch.yml took 5 minutes. But clearly there are additional steps required to clone a Graylog system… but what are they?

I think the biggest clue is when I go to System/Nodes on the clone and I see:

There is 1 active node
a002c9da / old-host-name.domain.com

The Memory/Heap usage bar on that page moves up and down exactly in sync with the original system. So… where is it pulling that node # and hostname from? It has to be from mongoDB - right? Are there things in mongoDB that I need to change to make this clone work independently?

Is any of this making sense? Thanks again for your patience.

jan · July 11, 2019, 2:47pm

the http_* settings are from 3.0 …

what are all rest_* and web_* configuration settings in your setup? Does any of this contain the IP/Adress of the new origin and not the clone?

Is the MongoDB standalone? Is the clone connected to the correct MongoDB?

If you shutdown the clone, delete move /etc/graylog/server/node-id and start it again does it display the same UUID or does it show a new one?

danny999 · July 11, 2019, 5:39pm

Progress!

It didn’t like it when I removed the node-id file like you suggested: Graylog would not start up without it. Server.log showed it trying to generate a new Node ID and then failing with some error message.

However… when I put the node-id file back in place and started Graylog - boom it worked! Note I had previously edited this file and changed the ID by one digit. Graylog now shows the new ID and new hostname - when I go to System/Nodes:
b002c9da / new-host-name.domain.com

This is good… next it told me that there were no inputs, which was also good. I had to edit all my inputs one by one (Syslog UDP, Gelf TCP, etc etc) and change the bind IPs, they still had the original IP.

Then I realized my 360 indexes were no longer connected. System/Indices showed there was only 1 index totaling 700 bytes.

I then went to /elasticsearch/data where my indexes are stored. I saw a new folder had been created for the new node/hostname. I stopped elasticsearch/graylog and erased the new folder. Then renamed the old folder to the new hostname and restarted elasticsearch/graylog.

Now when I go to System/Indices it correctly states:

Default index set
360 indices, 7,181,175,572 documents, 3.6TB

My cluster is green, and everything looks good! However… I cannot do any searches, it doesn’t seem to be accessing those 360 indices properly. When I do a search in Graylog it does not come back with any results no matter how far I go back or what I search for. Logs provide no clues as to why its not connected to the database properly.

So close!!! Any suggestions? How can I troubleshoot my connection to these index files?

curl -XGET ‘http://localhost:9200/_cluster/health?pretty=true’
{
“cluster_name” : “new-host-name”,
“status” : “green”,
“timed_out” : false,
“number_of_nodes” : 1,
“number_of_data_nodes” : 1,
“active_primary_shards” : 360,
“active_shards” : 360,
“relocating_shards” : 0,
“initializing_shards” : 0,
“unassigned_shards” : 0,
“delayed_unassigned_shards” : 0,
“number_of_pending_tasks” : 0,
“number_of_in_flight_fetch” : 0,
“task_max_waiting_in_queue_millis” : 0,
“active_shards_percent_as_number” : 100.0
}

jan · July 12, 2019, 7:44am

you need to recalculate the index range - i guess.

danny999 · July 12, 2019, 5:05pm

Success!

Thanks for your help Jan, it is appreciated. My virtualized clone is up and running with all 360 indexes available. Now I can practice breaking it!

Here’s a summary of the steps I took, maybe this will save someone else time if they ever want to virtualize their Graylog/Elastic setup:

1.	Stop Graylog / Elasticsearch
2.	Use VMware Standalone Converter - Converts physical machine to virtual
3.	Start clone with NIC disabled. Assign new IP to clone. Bring up NIC.
4.	Update IP address in /etc/hosts, /etc/graylog/server/server.conf, /etc/elasticsearch/elasticsearch.yml, /etc/mongod.conf and any other place you find it (I used grep to search every file on my harddrive for the old IP).
5.	I then attached 4TB additional virtual storage to the VM
6.	Rsync of 4TB Elastic Search Partition w/ indexes to Clone’s storage partition
7.	Rename /etc/graylog/server/node-id to node-id-old
8.	Start Graylog/Elasticsearch – watch your server.log, it will try to generate a new node ID. 
9.	If it fails to generate a new node ID like it did in my case: stop services, edit your node-id-old, change your ID slightly by one character, rename it back to node-id. Start services and your new node should be recognized. 
10.	Stop services again. Go to /elasticsearch/data where your indexes are stored. Note the new folder with the new hostname. Erase it. Rename the old folder with the new hostname.
11.	Go to System/Inputs and edit all your binds: Syslog-UDP, GELF-TCP, update the bind IP addresses in each
12.	Go to System/Indices, click your Default Index Set, Click Maintenance, Click Recalculate Index Ranges. It took about 20 minutes to complete this for my 4TB, watch the progress in server.log
13.	Complete, you should have a working virtualized clone with access to all your data.

system · July 26, 2019, 5:05pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Virtualizing my Graylog Server for Testing Purposes Graylog Central (peer support)	4	772	May 28, 2019
Clone VM and change IP Graylog Central (peer support)	5	51	October 30, 2024
Graylog 3 cluster Graylog Central (peer support)	3	380	April 16, 2021
Cluster Infrastructure Graylog Central (peer support)	36	3710	May 9, 2019
Multi-node KO after ES and Graylog update Graylog Central (peer support)	11	1146	February 23, 2022

Cloning and Virtualizing my Graylog System

Related topics