XenServer + DRBD (Update 5) Failover Process

James White from the Citrix Forum (This Thread) asked for a video of VM failover if the Pool Master dies, This was part of my testing to I will add the guide.

Very impressed by how well it works!!!

See the DRBD install guide for my setup.

So the Pool Master has failed and is unrecoverable – all your VM’s are down – how to get them backup and running.

SSH onto a pool member (This will act as your new Pool Master)

– Convert your Member to a Master

$ xe pool-emergency-transition-to-master

– Recover connections to other member servers. (If Any)

$ xe pool-recover-slaves

– Verify the pool management has been restored

$ xe host-list

– Show the list of Pool members

$ xe host-list params=uuid,name-label,host-metrics-live

– Show the VM’s from the Failed Server – You will see their state as “Running”

$ xe vm-list is-control-domain=false resident-on=UUID_of_failed_server

– Reset the powerstate for all VM’s

$ xe vm-reset-powerstate resident-on=UUID_of_failed_server –force –multiple

Now all VM’s will be visable on XenCenter – Login to the Member server IP (The new Master) and Start your VM’s


When you recover the old Master simply boot it up and recover the DRBD replication

restart DRBDon both servers

$ /etc/init.d/drbd restart

$ drbdadm — –overwrite-data-of-peer primary drbd-sr1 (On the New Master server)

$ drbdadm primary drbd-sr1 (On the Old Master)

Let me know how it works for you.

Cheers, Joe

9 thoughts on “XenServer + DRBD (Update 5) Failover Process

  1. I’d wrote too early, looking at the video I noticed the notepad :-)).The clip, from this provider, doesn’t expand full screen (at least from firefox), you may think about upload the next video in youtube. Their embedded videos are working a bit better.

  2. Hi Joe – well i got everything up and running, and ive tested this guide and its working like a charm 🙂 i have a side question. how do i get split brain notifications on my email? can i just delete "root" in notify.sh and put in a email address?

  3. I was wondering what would be the best practice in a production deployment with multiple windows and linux vms:- single drbd device for all vms (worried about split-brain in case the replication network goes down)or- 1 drbd device per vm (worried about partition management in case I need to resize some server vhd’s)In addition, will 2 bonded gigabit nics be enough to replicate without being a bottleneck. (a win7 vm reads at ~150MB/s and writes at about ~70MB/s)Here’s my specs:2x dual intel 55xx towersboth with 1.8TB RAID52 gigabit nics each (one 4-port, one 2-port)2 x-over cables for replication (one in each nic, bonded)Any thoughts?

  4. Hi Joe,this Tutorial doesn’t work for me. Every steep seems to work, but if I’d like to start my virtual machines it stops with an Error:The VDI is not available.Do you have an idea?Greets Mike

  5. Sorry, i forgot some Logging Info:Sep 12 13:02:03 verde xapi: [error|verde|430|Async.VM.start R:ab205ba115a5|xapi] Vmops.start_paused caught: SR_BACKEND_FAILURE_46: [ ; The VDI is not available [opterr=VDI ea5cf737-82f2-4161-a453-eea7e73f08c8 already attached RW]; ]

  6. I did a lot of more tests. If you do a shutdown like Joe in the Xen Center everything works fine. You can simply start the VM on the other node.But if you produce a power loss you cannot start the VM on the other nodes. It always stops with the error:The VDI is not available [opterr=VDI xyz already attached RW]Is there anybody who has the same problems?Thanks for help.Mike

Leave a Reply

Your email address will not be published. Required fields are marked *