To do live migration you need shared storage. You can do NFS, which is really slow. You could do iSCSI, which is ironically even slower. Or you could use the qemu-nbd network block device server, which is better than the first two but means you're only as strong as the server running the nbd. This is also the case for iSCSI or NFS but those are both proven technologies running on stable servers or netapps if you are lucky. For these reasons, I recommend using fibre channel. We have fibre channel with a fibre channel switch with dual connections for redundancy. I'll cover how to configure that later with multipath.
We are using blades as the hypervisors. This has a few key advantages. The first is latency between hypervisors is very minimal. It depends on your manufacturer but most blades feature very good bandwidth between blades of the same chassis using virtual ethernet devices. The other advantage is that as far as the rest of the network is concerned, whether or not a vm is living on blade 1 or blade 10 doesn't matter. Nothing has to change with routing to get to the vm, it's still the same destination (assuming you don't have multiple switches in your blade chassis.
Using multipath is a critical step, and it's very straightforward to implement. When setting up a new virtual machine we carve out part of the fibre channel array for the device. Our array will return the id of the new device upon creation of a new lun, we make note of that to make sure we are talking about the right slice of the disk.
First, we configure multipath to look at the scsi devices, by default everything is blacklisted, so comment out the lines that look like this in /etc/multipath.conf:
blacklist {
devnode "*"
}
change to:
#blacklist {
# devnode "*"
#}
Next we'll tell multipath how to recognise the different luns on the system. This is usually at the bottom of the default multipath.conf file.
devices {
device {
vendor "Vendor"
product "OurFC"
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
}
This tells multipath how to identify luns on the system uniquely. For example, we have a lun configured for a test vm, it appears on the system as device /dev/sdb, we run scsi_id against /block/sdc to determine the scsi_id of the lun.
[root@hypervisor0 ~]# /sbin/scsi_id -g -u -s /block/sdc Vendor_FF01000033100100In our case, since we have redundant links to the fibre channel so both /dev/sdc and /dev/sde return the same results
[root@hypervisor0 ~]# /sbin/scsi_id -g -u -s /block/sde Vendor_FF01000033100100Now, to make this useful, we have to define an alias for this scsi_id, so we'll make an alias called vm1, since this is our first vm.
multipaths {
multipath {
wwid Vendor_FF01000033100100
alias vm1
}
}
Here we are telling multipath to create a device in /dev/mapper called vm1 and have any data destined to that device to be routed to either /dev/sdc or /dev/sde depending on which is available. You can specify a round-robin on that, but that's no important right now. The important thing is that on any server we build, if we have the same multipath.conf file, the device /dev/mapper/vm1 will exist on that server. Moreoever, it will be the correct lun on the fibre channel each time.
[root@hypervisor ~]# echo "- - -" > /sys/class/scsi_host/host3/scan [root@hypervisor ~]# dmesg |grep sd SCSI device sdag: drive cache: write back sdag: unknown partition table sd 3:0:1:14: Attached scsi disk sdag sd 3:0:1:14: Attached scsi generic sg35 type 0Using dmesg we can see the device name assigned by the kernel to the new partition. We can then run scsi_id on the device to determine the scsi_id of that device.
[root@hypervisor ~]# /sbin/scsi_id -g -u -s /block/sdag Vendor_AA01000D33100100We can then add the id to /etc/multipath.conf to create a device in /dev/mapper. So in our multipath.conf we put
multipaths {
multipath {
wwid Vendor_AA01000D33100100
alias vm_test
}
}
This will create a block file /dev/mapper/vm_test, this is what we will tell the vm to use as it's hard drive. Now, if we use some version control (subversion), or configuration control (puppet), we can push this config file to all our hypervisors. This means that /dev/mapper/vm_test will exist on each of the hypervisors, moreover, it will be the correct block file for our system.
<devices>
<emulator>/usr/bin/qemu-kvm
<disk type='block' device='disk'>
<source dev='/dev/mapper/ldap'/>
<target dev='hda' bus='ide'/>
</disk>