By thomas, Fri, 09/18/2009 - 14:42
Our current environment was arrived at by testing a few things that really didn't work very well. The current setup uses fibre channel for storage and blade servers for the hypervisors.

To do live migration you need shared storage. You can do NFS, which is really slow. You could do iSCSI, which is ironically even slower. Or you could use the qemu-nbd network block device server, which is better than the first two but means you're only as strong as the server running the nbd. This is also the case for iSCSI or NFS but those are both proven technologies running on stable servers or netapps if you are lucky. For these reasons, I recommend using fibre channel. We have fibre channel with a fibre channel switch with dual connections for redundancy. I'll cover how to configure that later with multipath.

We are using blades as the hypervisors. This has a few key advantages. The first is latency between hypervisors is very minimal. It depends on your manufacturer but most blades feature very good bandwidth between blades of the same chassis using virtual ethernet devices. The other advantage is that as far as the rest of the network is concerned, whether or not a vm is living on blade 1 or blade 10 doesn't matter. Nothing has to change with routing to get to the vm, it's still the same destination (assuming you don't have multiple switches in your blade chassis.