XenServer + DRBD (Update 4) The Guide

drbd guide

*** Is you are using XenSever 5.6 FP1 please see http://joekane.eu/xenserver-56-fp1-drbd-working first.

Here is the DRBD + XenServer install guide, Any issues post a comment and I’ll try my best to resolve.

###### DRBD + Xenserver Guide: JoeKane.eu ########

Create DRBD RPM for installation on XenServer

Download the Driver Development Kit (ISO) from http://www.citrix.com/lang/English/lp/lp_1688621.asp
Mount or Extract the ISO and import using XenCenter – Be sure to select XML as file type
$ mkdir /drbd/
$ cd /drbd/
$ wget http://oss.linbit.com/drbd/8.3/drbd-8.3.7.tar.gz
$ tar drbd-8.3.7.tar.gz
$ cd /drbd-8.3.7
$  ./configure –enable-spec –with-km
$ make tgz
$ cp drbd*.tar.gz `rpm -E _sourcedir`
$ rpmbuild -bb drbd.spec
$ rpmbuild -bb drbd-km.spec

RPM is located in /usr/src/redhat/RPMS/i386/

See: http://www.drbd.org/users-guide/s-build-rpm.html

If you want to skip this step I have attached the RPM’s needed!

Download RPM’s:

DRBD 8.3.7 + XenServer 5.5 U2 RPM: http://bit.ly/dmvv0g

DRBD 8.3.7 + XenServer 5.6 Beta RPM: http://bit.ly/9hHNEt

DRBD 8.3.7 + XenServer 5.6 Final RPM: http://bit.ly/diYtbY

DRBD 8.3.8.1 + XenServer 5.6 Final RPM: http://bit.ly/cNboke

If you are using XenSever 5.6 FP1 please see http://joekane.eu/xenserver-56-fp1-drbd-working first.

—————————————————————————

Install XenServer 5.5 U2 on each server – Don’t configure local storage – Connect the servers together using a crossover cable.
My setup has 1x 80GB drive for XenServer install and 1x 250GB drive for the DRBD storage

1.    Install DRBD (Both Servers)

Copy drbd-km-2.6.18_128.1.6.el5.xs5.5.0.505.1024xen-8.3.7-12.i386.rpm & drbd-utils-8.3.7-1.i386.rpm to your XenServer /tmp directories – Use Winscp for this

$ rpm -ivh drbd-utils-8.3.7-1.i386.rpm
$ rpm -ivh drbd-km-2.6.18_128.1.6.el5.xs5.5.0.505.1024xen-8.3.7-12.i386.rpm

2.     Configure another Management Nic’s for each XenServer

The IP addresses I’m using are 10.1.1.1 & 10.1.1.2 (255.255.255.0 netmask)

3.    Fdisk the Drives (Both Servers)
Be sure to configure your Drive using Fdisk, you can see your partition table using
$ Fdisk –l
I won’t go into too much detail here; fdisk is very easy to use, in my setup I have the 250GB as /dev/sda and a single partition /dev/sda1

4.    Configure DRBD (Both Servers)

 Nano /etc/drbd.conf

# ———————-
# You can find an example in  /usr/share/doc/drbd…/drbd.conf.example

#include “drbd.d/global_common.conf”;
#include “drbd.d/*.res”;

resource drbd-sr1 {
protocol C;
startup {
}

disk { max-bio-bvecs 1;
}

net {
allow-two-primaries;
cram-hmac-alg “sha1”;
shared-secret “password”;
after-sb-0pri discard-zero-changes;
after-sb-1pri consensus;
after-sb-2pri disconnect;
}

syncer {
rate 1G;
}

on xenserver-drbd1 {
device /dev/drbd1;
disk /dev/sda1;
address 10.1.1.1:7789;
meta-disk internal;
}
on xenserver-drbd2 {
device /dev/drbd1;
disk /dev/sda1;
address 10.1.1.2:7789;
meta-disk internal;
}
}
# ——————–

4. Configure Notifications in case of Split brain. (Both Servers)

nano /usr/lib/drbd/notify.sh

# Add to the bottom of the file

esac

#echo “$BODY” | mail -s “$SUBJECT” $RECIPIENT
HOST_UUID=`xe host-list –minimal`
xe message-create body=”$BODY” host-uuid=$HOST_UUID name=DRBD_ATERT priority=10

#I also changed the “BODY” message because it was received empty.

case “$0” in
*split-brain.sh)
SUBJECT=”DRBD split brain on resource $DRBD_RESOURCE”
BODY=” Split brain detected, Manual split brain recovery is necessary! “
# BODY=”
#DRBD has detected split brain on resource $DRBD_RESOURCE
#between $(hostname) and $DRBD_PEER.
#Please rectify this immediately.
#Please see http://www.drbd.org/users-guide/s-resolve-split-brain.html for details on doing so.”
;;

——————————-

5.    Allow port 7788 through Iptables (Both Servers)

Get your interface alias using $ ifconfig

(Thanks Sam)

Ifconfig

——————————-
$ iptables -I INPUT 1 -i xenbr2 -p tcp –dport 7788:7799 -j ACCEPT
$ /etc/init.d/iptables save
——————————-

6.    Configure LVM to see your Drive (Both Servers)

nano /etc/lvm/lvm.conf

# By default we accept every block device:

Add your block device “r|/dev/sda.*|” to the filter,

e.g. filter = [ “r|/dev/xvd.|”, “r|/dev/sda.*|” ]

#Set the write cache state to 0 default is 1

write_cache_state = 0

———————————

7. Remove LVM Cache (Thanks PF4)

rm -f /etc/lvm/cache/.cache

——————————–

8.    Create Xenserver pool using XenCenter

———————————

9.    Enable DRBD Storage and Replicate

On both XenServers execute the following

$ drbdadm create-md drbd-sr1
$ modprobe drbd
$ drbdadm up drbd-sr1

Now ONLY on the XenServer that will be your primary

$ drbdadm — –overwrite-data-of-peer primary drbd-sr1
$ cat /proc/drbd

After Full Sync – Make pool secondary drbd primary “$ drbdadm primary drbd-sr1”

$ cat /proc/drbd will not show Primary:Primary

10.    Create Storage Repository

$ xe sr-create shared=true device-config:device=”/dev/drbd1″ name-label=”DRBD-SR1″ type=lvm

$ chkconfig drbd on (on both XenServers)

That’s it – install your VM’s and test VM migration.
Any Questions post a comment.

Thanks go to,
Florian Haas from linbit.com (The Makers of DRBD): http://fghaas.wordpress.com/2007/09/03/drbd-806-brings-full-live-migration-fo…
Natanael Mignon of http://natanael-mignon.blogspot.com/ – Gave the starting steps to get everything working.
Citrix for providing XenServer and the Citrix Forums
Loads of website related to DRBD configs and the DRBD Mailing list.

*** Small Update – Dont forget to add the names of your servers to /etc/hosts using your crossover IP addresses

181 thoughts on “XenServer + DRBD (Update 4) The Guide

  1. Joe,thanks for your script. Everything works fine up to creating new VM. If i start VM, Xen logs "The SR backend failed to complete the operation." If i move the VM to the second server of the pool, the vm starts. Seems that only one poolmember is able to read/write to shared storage, despite primary/primary configuration.Thanks and greezKarsten

  2. Hi Karsten, this is usually a issue with your DRBD.conf file, make sure it includes allow-two-primaries;after-sb-0pri discard-zero-changes;after-sb-1pri discard-secondary;Also make sure your setup is Primary:Primary -> can you post the output ofCat /proc/drbdCheers, Joe

  3. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META content="text/html; charset=us-ascii" http-equiv=Content-Type> <META name=GENERATOR content="MSHTML 8.00.6001.18882"></HEAD> <BODY> <DIV dir=ltr align=left><SPAN class=650545117-06032010></SPAN><FONT face=Verdana><FONT color=#0000ff><FONT size=2>D<SPAN class=650545117-06032010>ear Joe,</SPAN></FONT></FONT></FONT></DIV> <DIV dir=ltr align=left><FONT face=Verdana><FONT color=#0000ff><FONT size=2><SPAN class=650545117-06032010></SPAN></FONT></FONT></FONT>&nbsp;</DIV> <DIV dir=ltr align=left><SPAN class=650545117-06032010></SPAN><FONT face=Verdana><FONT color=#0000ff><FONT size=2>m<SPAN class=650545117-06032010>y drbd.conf should be o.k.&nbsp;Here it is:</SPAN></FONT></FONT></FONT></DIV> <DIV dir=ltr align=left><SPAN class=650545117-06032010></SPAN><FONT face=Verdana><FONT color=#0000ff><FONT size=2>global { usage-count yes; }<BR>common {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; protocol C;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; net {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; allow-two-primaries;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; shared-secret "Micromedic2003";<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; after-sb-0pri discard-zero-changes;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; after-sb-1pri discard-secondary;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>disk {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; max-bio-bvecs 1;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>startup {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; become-primary-on both;<BR>}<BR>handlers {<BR>#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; split-brain "usr/lib/drbd/notify-split-brain.sh";<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; syncer { rate 10M; }<BR>}<BR>resource drbd-sr {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; device&nbsp; /dev/drbd1;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; disk /dev/sda3;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; meta-disk internal;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; on vms01{<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; address 192.168.150.150:7788;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; on vms03 {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; address 192.168.150.149:7788;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>}<BR><SPAN class=650545117-06032010>&nbsp;</SPAN></FONT></FONT></FONT></DIV> <DIV dir=ltr align=left><FONT face=Verdana><FONT color=#0000ff><FONT size=2><SPAN class=650545117-06032010>Both Servers are Primary. Here is /proc/drbd:</SPAN></FONT></FONT></FONT></DIV> <DIV dir=ltr align=left><FONT face=Verdana><FONT color=#0000ff><FONT size=2><SPAN class=650545117-06032010>version: 8.3.7 (api:88/proto:86-91)<BR>GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by <A href="mailto:root@localhost.localdomain">root@localhost.localdomain</A>, 2010-01-23 10:07:05</SPAN></FONT></FONT></FONT></DIV> <DIV><FONT color=#0000ff size=2 face=Verdana></FONT>&nbsp;</DIV> <DIV dir=ltr align=left><FONT face=Verdana><FONT color=#0000ff><FONT size=2><SPAN class=650545117-06032010>&nbsp;1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r—-<BR>&nbsp;&nbsp;&nbsp; ns:0 nr:8702430 dw:8702430 dr:236 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0<BR></SPAN><SPAN class=650545117-06032010>&nbsp;</SPAN></FONT></FONT></FONT></DIV> <DIV dir=ltr align=left><FONT face=Verdana><FONT color=#0000ff><FONT size=2><SPAN class=650545117-06032010>Cheers,</SPAN></FONT></FONT></FONT></DIV> <DIV dir=ltr align=left><FONT face=Verdana><FONT color=#0000ff><FONT size=2><SPAN class=650545117-06032010>Karsten&nbsp;</SPAN></FONT></FONT></FONT><BR><SPAN class=650545117-06032010><FONT color=#0000ff size=2 face=Verdana>&nbsp;</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=650545117-06032010>&nbsp;</SPAN></DIV> <DIV></DIV> <DIV style="LINE-HEIGHT: 18px; WIDTH: 600px; FONT-FAMILY: Arial, Helvetica, sans-serif; FONT-SIZE: 12px" class=PosterousEmail></DIV></BODY></HTML>

  4. Hi Karsten, Looks like your "resource drbd-sr" should be above the disk declaration, try this——————————resource drbd-sr {protocol C;startup {}disk { max-bio-bvecs 1;}net {allow-two-primaries;cram-hmac-alg "sha1";shared-secret "Micromedic2003";after-sb-0pri discard-zero-changes;after-sb-1pri discard-secondary;after-sb-1pri consensus;after-sb-2pri disconnect;}syncer {rate 10M;}on vms01 {device /dev/drbd1;disk /dev/sda3;address 192.168.150.150:7788;meta-disk internal;}on vms02 {device /dev/drbd1;disk /dev/sda3;address 192.168.150.150:7788;meta-disk internal;}}—————————-Copy to both servers and reboot DRBD on both/etc/init.d/drbd restartThen on the primarydrbdadm — –overwrite-data-of-peer primary drbd-srLet me know how it works,Cheers, Joe

  5. Hi Joe – looks very nice that script… i recently purchased two master servers with loads of evrything, cpu,ram, storage, and i’m looking forward to getting this to work. My question is this, if i want to partition my local storage into multiple localstorage, and i want to use drbd to replicate these local storage. how would a script then look?

  6. Hi Morten, using fdisk just create your primary partition e.g. /dev/sda – then create your other partitions on top of it. Configure DRBD to replicate your primary. Good luck with your project!!!

  7. hopefully somebody had the same problem, everything is up and running, drbd is running, primary primary, sr created, but i can start the vm with uses drbd only on one host, live migration is also not possible.10.03.2010 19:22:47 Error: Migrating VM ‘UbuntuDRBD’ from ‘xentest2’ to ‘xentest1’ – VM migration failed: SR_BACKEND_FAILURE: [ non-zero exit; ; Traceback (most recent call last): File "/opt/xensource/sm/LVMSR", line 1357, in ? SRCommand.run(LVHDSR, DRIVER_INFO) File "/opt/xensource/sm/SRCommand.py", line 150, in run ret = cmd.run(sr) File "/opt/xensource/sm/SRCommand.py", line 73, in run return target.attach(self.params[‘sr_uuid’], self.vdi_uuid) File "/opt/xensource/sm/LVMSR", line 830, in attach self._loadThis() File "/opt/xensource/sm/LVMSR", line 1276, in _loadThis vhdInfo = vhdutil.getVHDInfo(self.path, lvhdutil.extractUuid, False) File "/opt/xensource/sm/vhdutil.py", line 96, in getVHDInfo ret = ioretry(cmd) File "/opt/xensource/sm/vhdutil.py", line 75, in ioretry errlist = [errno.EIO, errno.EAGAIN]) File "/opt/xensource/sm/util.py", line 225, in ioretry return f() File "/opt/xensource/sm/vhdutil.py", line 74, in <lambda> return util.ioretry(lambda: util.pread2(cmd), File "/opt/xensource/sm/util.py", line 130, in pread2 return pread(cmdlist) File "/opt/xensource/sm/util.py", line 124, in pread raise CommandException(rc, str(cmdlist), stderr.strip())util.CommandException: 2 ]Or if I try start it on the one host, I get only the error "The VDI is not available"Any ideas?

  8. Hi Butch, I suspect there is a config issue with your DRBD.conf – Can you post it here and I’ll have a look. Cheers, Joe

  9. resource drbd-sr1 {protocol C;startup {}disk { max-bio-bvecs 1;}net {allow-two-primaries;cram-hmac-alg "sha1";shared-secret "EquadaRockS";after-sb-0pri discard-zero-changes;after-sb-1pri discard-secondary;after-sb-1pri consensus;after-sb-2pri disconnect;}syncer {rate 1G;}on xentest1 {device /dev/drbd1;disk /dev/sda3; # customise based on your drive layout see fdisk aboveaddress 192.168.28.1:7789;meta-disk internal;}on xentest2 {device /dev/drbd1;disk /dev/sda3; # customise based on your drive layout see fdisk aboveaddress 192.168.28.2:7789;meta-disk internal;}}

  10. Mar 10 20:58:57 xentest1 xapi: [ info|xentest1|56 inet-RPC|dispatch:VM.start D:12a1811b6d51|taskhelper] task Async.VM.start_on R:c0e791c31b18 forwarded (trackid=b5d1ffd8c6057a6c239e8495ae675847)Mar 10 20:58:57 xentest1 xapi: [ info|xentest1|57 inet-RPC|dispatch:VBD.plug D:c602d972b181|taskhelper] task VBD.plug R:aa75d31494a1 forwarded (trackid=73d02d7165b8d56e61e1570f7ff3daed)Mar 10 20:58:57 xentest1 xapi: [ info|xentest1|57 inet-RPC|sm_exec D:8a057e405f65|xapi] Session.create trackid=4a12f4fe0ce636b8a6d302f6d857c88a pool=false uname= is_local_superuser=true auth_user_sid=Mar 10 20:58:58 xentest1 kernel: device-mapper: table: 251:3: linear: dm-linear: Device lookup failedMar 10 20:58:58 xentest1 kernel: device-mapper: ioctl: error adding target to tableMar 10 20:58:58 xentest1 xapi: [ info|xentest1|57 inet-RPC|sm_exec D:8a057e405f65|xapi] Session.destroy trackid=4a12f4fe0ce636b8a6d302f6d857c88a

  11. Hi Butch, Can you give me ssh to your test server, I’ll see if I can fix – drop me a mail joekane [at.] gmail [.dot] com

  12. Hi Joe, i’m with you on the fdisk, i was fishing after en script template 😉 that i could use in drbd.conf as to setup multiple replicates

  13. thx a lot for your spontaneously offer, but i was able to fix it this morning. apparently a problem with an old lvm signature on one drbd partition

  14. Glad to hear Butch, for anyone else, if you want to clear old LVM sigs you can use "dd if=/dev/zero of=/dev/sda3 bs=4096 count=1000" – /dev/sda3 customise based on your partitions!

  15. Hi,anybody out there, who has appropriate write performance on drbd-device? Even in disconnected (Standalone) state, write perfomance is about 18MB/s compared to 240MB/s without drbd. Hardware is trully not the limitation in my situation.Thanks in advance,Karsten

  16. Ho Joe,my hardware setup: 8xSAS 15k, Raid6, Adaptec 5805 with BBU. Dual Xeon 5430. I test write performance with dd if=/dev/zero of=/dev/drbd1 count=10000 bs=1M oflag=directI tested with lvm cache on/off. With lvm-cache on, i have good performance, but Linbit recommends to turn it off.What’s your opinion?Greez, Karsten

  17. <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml&quot; xmlns="http://www.w3.org/TR/REC-html40"&gt; <head> <meta http-equiv=Content-Type content="text/html; charset=utf-8"> <meta name=Generator content="Microsoft Word 12 (filtered medium)"> <style> <!– /* Font Definitions */ @font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face
    {font-family:Calibri;
    panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face
    {font-family:Tahoma;
    panose-1:2 11 6 4 3 5 4 4 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal
    {margin:0cm;
    margin-bottom:.0001pt;
    font-size:12.0pt;
    font-family:"Times New Roman","serif";} a:link, span.MsoHyperlink
    {mso-style-priority:99;
    color:blue;
    text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed
    {mso-style-priority:99;
    color:purple;
    text-decoration:underline;} p
    {mso-style-priority:99;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman","serif";} span.E-MailFormatvorlage18
    {mso-style-type:personal-reply;
    font-family:"Calibri","sans-serif";
    color:#1F497D;} .MsoChpDefault
    {mso-style-type:export-only;} @page Section1
    {size:612.0pt 792.0pt;
    margin:70.85pt 70.85pt 2.0cm 70.85pt;} div.Section1
    {page:Section1;} –> </style> <!–[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026" /> </xml><![endif]–><!–[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1" /> </o:shapelayout></xml><![endif]–> </head> <body lang=DE link=blue vlink=purple> <div class=Section1> <p class=MsoNormal><span lang=EN-US style=’font-size:11.0pt;font-family:"Calibri","sans-serif"; color:#1F497D’>Thanks Joe,<o:p></o:p></span></p> <p class=MsoNormal><span lang=EN-US style=’font-size:11.0pt;font-family:"Calibri","sans-serif"; color:#1F497D’>i will read this. Link between servers is a 10Gbit cross.<o:p></o:p></span></p> <p class=MsoNormal><span lang=EN-US style=’font-size:11.0pt;font-family:"Calibri","sans-serif"; color:#1F497D’>Karsten<o:p></o:p></span></p> <p class=MsoNormal><span lang=EN-US style=’font-size:11.0pt;font-family:"Calibri","sans-serif"; color:#1F497D’></span></p> <div style=’border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm’> <p class=MsoNormal><b><span style=’font-size:10.0pt;font-family:"Tahoma","sans-serif"’>Von:</span></b><span style=’font-size:10.0pt;font-family:"Tahoma","sans-serif"’></span></p></div></div></body></html>

  18. Hi Karsten, Did you find a solution to the slow sync speed? I have a 10GB link between two test servers and experiencing slow speeds.

  19. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META content="text/html; charset=iso-8859-1" http-equiv=Content-Type> <META name=GENERATOR content="MSHTML 8.00.6001.18882"></HEAD> <BODY> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana>Hi Joe,</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana></FONT></SPAN>&nbsp;</DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana>thanks for asking.</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana>Yes, i did. The solution is the drbb.conf (in my case global_common.con due to drbd 8.3.7). Actually, these are the parameters, that speed it up to 150MB/s.</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana>max-buffers 8192;<BR>max-epoch-size 8192;<BR>sndbuf-size 1024k;</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana>al-extents 2099;</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana>no-md-flushes;<BR>no-disk-flushes;<BR>no-disk-barrier;</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana></FONT></SPAN>&nbsp;</DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana>But be careful with the last three ones. You have to have BBU (Battery Backup Unit) at RAID-Controller in case of server crash.</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana>You can play with the parameters all described on linbit site to speed up sync rate. They depends on hardware i.e. I/O throughput. Due to write speed of 300MB/s on both servers, i am not satisfied with 150MB/s sync speed at the time. But i will play with parameters above and share my experiences. Due to little overhead, replication on block level doesn’t require extraordinary high sync speed. Just for initial sync it is recommended, of course. </FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana>Again: Without changing hardware or drinking magic potion, sync rate rises from 2MB/s to 150MB/s by playing around with drbd.conf parameters. Hope this will help.</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana>greez from cologne, germany</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=751102716-23032010><FONT color=#0000ff size=2 face=Verdana>Karsten</FONT></SPAN></DIV><BR> <DIV dir=ltr lang=de class=OutlookMessageHeader align=left></DIV></BODY></HTML>

  20. Karsten can you add those options within drbd.conf or do you have to use global_common.conf? Im testing this now – added the options to drbd.conf but not seeing any improvement. Im also using 8.3.7

  21. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META content="text/html; charset=iso-8859-1" http-equiv=Content-Type> <META name=GENERATOR content="MSHTML 8.00.6001.18882"></HEAD> <BODY> <DIV dir=ltr align=left><SPAN class=018421110-24032010><FONT color=#0000ff size=2 face=Verdana>Hi Joe,</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=018421110-24032010><FONT color=#0000ff size=2 face=Verdana></FONT></SPAN>&nbsp;</DIV> <DIV dir=ltr align=left><SPAN class=018421110-24032010><FONT color=#0000ff size=2 face=Verdana>i just added them in global_common.conf. Here is mine:</FONT></SPAN></DIV> <DIV dir=ltr align=left><SPAN class=018421110-24032010><FONT color=#0000ff size=2 face=Verdana></FONT></SPAN>&nbsp;</DIV> <DIV><FONT color=#0000ff size=2 face=Verdana>global { usage-count yes; }<BR>common {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; protocol C;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; net {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; allow-two-primaries;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; shared-secret "Passwordhere";<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; after-sb-0pri discard-zero-changes;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; after-sb-1pri discard-secondary;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; after-sb-2pri disconnect;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; max-buffers 8192;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; max-epoch-size 8192;<BR>#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;unplug-watermark 128;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; sndbuf-size 1024k;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>disk {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; no-md-flushes;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; no-disk-flushes;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; no-disk-barrier;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; max-bio-bvecs 1;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>startup {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; become-primary-on both;<BR>}<BR>handlers {<BR>#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; split-brain "usr/lib/drbd/notify-split-brain.sh";<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; syncer { rate 150M;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; al-extents 2099;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>}<BR>resource drbd-sr {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; device&nbsp; /dev/drbd1;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; disk /dev/sda3;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; meta-disk internal;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; on vms01{<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; address 10.10.10.150:7788;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; on vms02 {<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; address 10.10.10.151:7788;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR>}<BR></FONT></DIV> <DIV><SPAN class=018421110-24032010><FONT color=#0000ff size=2 face=Verdana>cheers,</FONT></SPAN></DIV> <DIV><SPAN class=018421110-24032010><FONT color=#0000ff size=2 face=Verdana>Karsten</FONT></SPAN></DIV><BR> <DIV dir=ltr lang=de class=OutlookMessageHeader align=left></DIV></BODY></HTML>

  22. Greate guide,You have helped me get a XenServer + DRBD setup with very little prior knowledge of linux and DRBD.However, I am experiencing one issue. After following the instructions in the guide and verifying that everything is working, I find that if I restart one of the servers I run into a split-brain setup. I have been the manual split-brain recovery steps to mitigate the situation but I was wondering there was something wrong with my config/etc/drbd.confresource drbd-res1 {protocol C;startup {become-primary-on both;}disk { max-bio-bvecs 1;}net {allow-two-primaries;cram-hmac-alg "sha1";shared-secret "password";after-sb-0pri discard-zero-changes;after-sb-1pri discard-secondary;}syncer {rate 1G;}on NODE01 {device /dev/drbd1;disk /dev/md0; # customise based on your drive layout see fdisk aboveaddress 10.10.220.1:7789;meta-disk internal;}on NODE02 {device /dev/drbd1;disk /dev/md0; # customise based on your drive layout see fdisk aboveaddress 10.10.220.2:7789;meta-disk internal;}}Before the node is rebooted:cat /proc/drbdversion: 8.3.7 (api:88/proto:86-91)GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@localhost.localdomain, 2010-02-08 12:20:27 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r—- ns:0 nr:8 dw:24 dr:1888 al:1 bm:2 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0after a node is rebooted:Node1version: 8.3.7 (api:88/proto:86-91)GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@localhost.localdomain, 2010-02-08 12:20:27 1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r—- ns:0 nr:0 dw:24 dr:2392 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0Node2version: 8.3.7 (api:88/proto:86-91)GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@localhost.localdomain, 2010-02-08 12:20:27 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r—- ns:0 nr:0 dw:0 dr:260 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0when i look at the the message logs/var/log/messagesNode2Mar 31 09:03:45 NODE02 kernel: drbd: initialized. Version: 8.3.7 (api:88/proto:86-91)Mar 31 09:03:45 NODE02 kernel: drbd: GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@localhost.localdomain, 2010-02-08 12:20:27Mar 31 09:03:45 NODE02 kernel: drbd: registered as block device major 147Mar 31 09:03:45 NODE02 kernel: drbd: minor_table @ 0xeec1bec0Mar 31 09:03:45 NODE02 kernel: block drbd1: Starting worker thread (from cqueue/0 [5021])Mar 31 09:03:45 NODE02 kernel: klogd 1.4.1, ———- state change ———- Mar 31 09:03:45 NODE02 kernel: block drbd1: disk( Diskless -> Attaching ) Mar 31 09:03:45 NODE02 kernel: block drbd1: No usable activity log found.Mar 31 09:03:45 NODE02 kernel: block drbd1: Method to ensure write ordering: barrierMar 31 09:03:45 NODE02 kernel: block drbd1: max_segment_size ( = BIO size ) = 32768Mar 31 09:03:45 NODE02 kernel: block drbd1: Adjusting my ra_pages to backing device’s (32 -> 64)Mar 31 09:03:45 NODE02 kernel: block drbd1: drbd_bm_resize called with capacity == 3906930672Mar 31 09:03:45 NODE02 kernel: block drbd1: resync bitmap: bits=488366334 words=15261448Mar 31 09:03:45 NODE02 kernel: block drbd1: size = 1863 GB (1953465336 KB)Mar 31 09:03:45 NODE02 kernel: block drbd1: recounting of set bits took additional 9 jiffiesMar 31 09:03:45 NODE02 kernel: block drbd1: 0 KB (0 bits) marked out-of-sync by on disk bit-map.Mar 31 09:03:45 NODE02 kernel: block drbd1: disk( Attaching -> UpToDate ) Mar 31 09:03:45 NODE02 kernel: block drbd1: Barriers not supported on meta data device – disablingMar 31 09:03:46 NODE02 kernel: block drbd1: conn( StandAlone -> Unconnected ) Mar 31 09:03:46 NODE02 kernel: block drbd1: Starting receiver thread (from drbd1_worker [5028])Mar 31 09:03:46 NODE02 kernel: block drbd1: receiver (re)startedMar 31 09:03:46 NODE02 kernel: block drbd1: conn( Unconnected -> WFConnection ) Mar 31 09:03:46 NODE02 kernel: block drbd1: bind before connect failed, err = -99Mar 31 09:03:46 NODE02 kernel: block drbd1: conn( WFConnection -> Disconnecting ) Mar 31 09:03:46 NODE02 kernel: block drbd1: role( Secondary -> Primary ) Mar 31 09:03:46 NODE02 kernel: block drbd1: Creating new current UUIDMar 31 09:03:46 NODE02 kernel: block drbd1: Discarding network configuration.Mar 31 09:03:46 NODE02 kernel: block drbd1: Connection closedMar 31 09:03:46 NODE02 kernel: block drbd1: conn( Disconnecting -> StandAlone ) Mar 31 09:03:46 NODE02 kernel: block drbd1: receiver terminatedMar 31 09:03:46 NODE02 kernel: block drbd1: Terminating receiver threadAlso after the reboot, the server comes back up in XenCenter and successfully connects to the SR without issue. The DRBD resource seems to be the only thing experiencing any issues.I am trying to get my configuration so at after a server is rebooted cleanly they both come up "Primary/Primary" and connected.Thanks in advance for any help you can giveTim

  23. Hi TimDoing a $ drbdadm — –overwrite-data-of-peer primary drbd-sr1 on the Pool Master with force a sync and bring the DRBD resource back to Primary:PrimaryIf this doesn’t work be sure to check the iptables rules on both servers,$ iptables -LWill list all the rules.To bring it back to Primary:Primary after every clean reboot is possible but you would need to write a script for that – you would also need to factor in a few different scenarios.I dont have the time at the minute to investigate but will do shortly.Cheers, Joe

  24. Joe,Thanks for your reply. I have been using the command that you specified $ drbdadm — –overwrite-data-of-peer primary drbd-sr1 to get everything backup again and it works well. I was just under the impression that that was going to be handled automatically with the startup parameters in the drbd.conf:startup {become-primary-on both;}andnet {allow-two-primaries;cram-hmac-alg "sha1";shared-secret "password";after-sb-0pri discard-zero-changes;after-sb-1pri discard-secondary;}if it does require a script to perform this automatically, if you can just point me in the direction that I would need to look in I would be happy to take a stab at it.Thanks again,Tim

  25. xe sr-create shared=true device-config:device="/dev/drbd0" name-label="DRBD-SR1" type=lvmError code: SR_BACKEND_FAILURE_181Error parameters: , Error in Metadata volume operation for SR. [opterr=Error introducing Metadata Volume],

  26. With XenServer 5.5.0 Update 2, we’re getting the same issue than Vasko. Using dd on /dev/sda3 (which is the drbd underlying partition) does not help…We’ve tried several things, such as formatting /dev/drbd1 as ext3 (to check we could write to it). This goes without issue.But creating the SR always fail…

  27. Try the following (This assumes /dev/sda as the disk & /dev/sda3 as the partition)Fdisk /dev/sdad – Delete the partition /sda3w – write the changesFdisk /dev/sdan – add a new partition Choose defaultsw – write changesAt the prompt run fsck -f -y /dev/sdaThis will take a few minutes to complete, when finished restart the server and continue from step 8 above.It can be frustrating; you will get it working don’t worry.Cheers, Joe

  28. Hi David, Install 5.5.0 won’t make a difference. If possible can you give me ssh access to the servers and I’ll take a look. A couple of users having this issue would like to get it addressed. joekane <@at> gmail <.dot> com

  29. The disk should be pure and not distributed necessarily$ service drbd stopAtach /dev/sdb$fdisk /dev/sdbn/dev/sdb1 On both XenServers execute the following
    $ drbdadm create-md drbd-sr1
    $ modprobe drbd
    $ drbdadm up drbd-sr1Now ONLY on the XenServer that will be your primary
    $ drbdadm — –overwrite-data-of-peer primary drbd-sr1
    $ watch cat /proc/drbdxe sr-create shared=true device-config:device="/dev/drbd0" name-label="DRBD-SR1" type=lvmGood create!

  30. 1. We create a folder on a server
    $ mkdir /drbd/2. We come into a folder
    $ cd /drbd/3. We download packages
    wget http://……drbd-utils-8.3.7-1.i386.rpm
    wget http://……drbd-km-2.6.18_128.1.6.el5.xs5.5.0.505.1024xen-8.3.7-12.i386.rpm4. We instal packages
    $ rpm -ivh drbd-utils-8.3.7-1.i386.rpm
    $ rpm -ivh drbd-km-2.6.18_128.1.6.el5.xs5.5.0.505.1024xen-8.3.7-12.i386.rpm5. We find the necessary disk
    [root@xen1 ~]# fdisk -lDisk /dev/sda: 250.0 GB, 250059350016 bytes255 heads, 63 sectors/track, 30401 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System/dev/sda1 * 1 499 4008186 83 Linux/dev/sda2 500 998 4008217+ 83 Linux/dev/sda3 999 30401 236179597+ 83 LinuxDisk /dev/sdb: 500.1 GB, 500106780160 bytes255 heads, 63 sectors/track, 60801 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytesDisk /dev/sdb doesn’t contain a valid partition tablethe necessary disk /dev/sdb6. We create section on a disk /dev/sdb1

    $ fdisk /dev/sdbIf there are sectionsdw
    $ fdisk /dev/sdbnp1w7. We format section
    $ mkfs -t ext3 /dev/sdb18. We delete the data on section
    $ dd if=/dev/zero of=/dev/sdb1 bs=4096 count=10009. Customisation DRBD (both servers)
    $ vi /etc/drbd.conf# ——————–global {usage-count yes;}common {syncer { rate 50M; }}resource drbd-sr1 {protocol C;startup {degr-wfc-timeout 120; # 2 minutes.outdated-wfc-timeout 2; # 2 seconds.become-primary-on both;}disk {on-io-error detach;max-bio-bvecs 1;}net {allow-two-primaries;cram-hmac-alg "sha1";shared-secret "password";after-sb-0pri discard-zero-changes;after-sb-1pri discard-secondary;after-sb-1pri consensus;after-sb-2pri disconnect;}syncer {rate 50M;}on xen1.abis.kiev.ua {device /dev/drbd0;disk /dev/sdb1;address 192.168.1.91:7789;meta-disk internal;}on xen2.abis.kiev.ua {device /dev/drbd0;disk /dev/sdb1;address 192.168.1.83:7789;meta-disk internal;}}# ——————–10. Configure Notifications in case of Split brain. (Both Servers)
    $ vi /usr/lib/drbd/notify.sh # ——————–# Add to the bottom of the fileesac#echo "$BODY" | mail -s "$SUBJECT" $RECIPIENTHOST_UUID=`xe host-list –minimal`xe message-create body="$BODY" host-uuid=$HOST_UUID name=DRBD_ATERT priority=10#I also changed the "BODY" message because it was received empty.case "$0" in*split-brain.sh)SUBJECT="DRBD split brain on resource $DRBD_RESOURCE"BODY=" Split brain detected, Manual split brain recovery is necessary! "# BODY="#DRBD has detected split brain on resource $DRBD_RESOURCE#between $(hostname) and $DRBD_PEER.#Please rectify this immediately.#Please see http://www.drbd.org/users-guide/s-resolve-split-brain.html for details on doing so.";;# ——————–11. Allow port 7788 through Iptables (Both Servers)
    $ iptables -I INPUT 1 -i drbd1 -p tcp –dport 7788:7799 -j ACCEPT
    $ /etc/init.d/iptables save

    $ service iptables stop
    $ chkconfig iptables off 12. Configure LVM to see your Drive (Both Servers)
    $ vi /etc/lvm/lvm.conf # By default we accept every block device:filter = [ "r|/dev/xvd.|", "r|/dev/sdb*.*|" ]#Set the write cache state to 0 default is 1write_cache_state = 013. Create Xenserver pool using XenCenterWe compare date14 .Enable DRBD Storage and ReplicateOn both XenServers execute the following $ service drbd stop
    $ drbdadm create-md drbd-sr1
    $ modprobe drbd
    $ drbdadm up drbd-sr1Now ONLY on the XenServer that will be your primary
    $ drbdadm — –overwrite-data-of-peer primary drbd-sr1
    $ watch cat /proc/drbdAfter Full Sync – Make pool secondary drbd primary "$ drbdadm primary drbd-sr1"
    $ cat /proc/drbd will not show Primary:Primary15. Create Storage Repository
    $ xe sr-create shared=true device-config:device="/dev/drbd0" name-label="DRBD-SR1" type=lvm
    $ chkconfig drbd on (on both XenServers)

  31. Looks ok Vasko, if you are still getting Error code: SR_BACKEND_FAILURE_181Error parameters: , Error in Metadata volume operation for SR. [opterr=Error introducing Metadata Volume], First check:$ xe sr-create shared=true device-config:device="/dev/drbd0" name-label="DRBD-SR1" type=lvmShould that be /dev/drbd1 ?If its a simple type – great – if not then there is something in the partition causing the issue – are the servers in a pool?You shouldn’t format the partition "mkfs -t ext3 /dev/sdb1"Try the following,drbdadm down allfdisk /dev/sdbd1wqDoes this sync the disk or tell you to reboot before use?If sync worked okfdisk /dev/sdbnp1choose defaultswq$ fsck -f -y /dev/sdbAfter the fsck is finishedReboot $ service drbd stop$ drbdadm create-md drbd-sr1$ modprobe drbd$ drbdadm up drbd-sr1Primary Only$ drbdadm — –overwrite-data-of-peer primary drbd-sr1Wait till is syncs upOn Secondary $ drbdadm primary drbd-sr1The on Primaryxe sr-create shared=true device-config:device="/dev/drbd1" name-label="DRBD-SR1" type=lvmLet me know if that works Cheers, Joe

  32. Hi Vasko, I’m afraid I don’t have the time to troubleshoot with you, if you can sent me ssh access or other I will take a look. Cheers, Joe

  33. The server costs on the first disk and storehouse we create on the second disk – so worksAll on one disk doesn’t work for mehttp://www.translate.ru/Default.aspx/Text

  34. Hi JoeI’m afraid it’s not, ive tested it with the beta of 5.6 and i can install it, and configure, but theres a problem when i try, starting the service.

  35. A note about the firewall.You need to verify what interface you’re adding the allow rule for. The example above is for a device named "drbd1".Look at the output of "ifconfig" to verify the correct device alias. For my "fresh" install of XenServer 5.5.0, the device name was "xenbr1"Once the correct rule was added, drbd started syncing.Look at http://www.drbd.org/users-guide/ch-admin.html#s-connection-states for the connection status

  36. Also dug around and found the pvremove command to get rid of the pesky LVM metadata. Constant removing the partition, rebooting, recreating the partition, rebooting, writing zeros to the device failed to purge the metadata for me."pvremove -ff /dev/sda3"

  37. Cheers Robert, Im in the process of adding Heartbeat/Pacemaker, im crazy busy at the moment so hopefully will have it tested and working next month.

  38. I’ve seen you using some CLI interface xen commands to move and boot the VM. Are all of them taken from the XenServer Administrator’s Guide ?

  39. Hi Joe, Cooli’m having issues with adding extra disk source to the drbd.conf could you see if there is anything you can see in the script?# ———————-# You can find an example in /usr/share/doc/drbd…/drbd.conf.example#include "drbd.d/global_common.conf";#include "drbd.d/*.res";resource drbd-sr1 {protocol C;startup {}disk { max-bio-bvecs 1;}net {allow-two-primaries;cram-hmac-alg "sha1";shared-secret "password";after-sb-0pri discard-zero-changes;after-sb-1pri discard-secondary;after-sb-1pri consensus;after-sb-2pri disconnect;}syncer {rate 1G;}on STORAGE1 {device /dev/drbd1;disk /dev/sdd; # customise based on your drive layout see fdisk abovedevice /dev/drbd2;disk /dev/sdc; # customise based on your drive layout see fdisk aboveaddress 10.0.2.1:7789;meta-disk internal;}on STORAGE2 {device /dev/drbd1;disk /dev/sdd; # customise based on your drive layout see fdisk abovedevice /dev/drbd2;disk /dev/sdc; # customise based on your drive layout see fdisk aboveaddress 10.0.2.2:7789;meta-disk internal;}}# ——————–

  40. Nope – i found out the problem myself i’ll paste it if there is anyone who’s interrested.but my problem is now this ?[root@STORAGE2 ~]# xe sr-create shared=true device-config:device="/dev/drbd1" name-label="DRBD-SR1" type=lvmThe SR operation cannot be performed because a device underlying the SR is in use by the host.?????????????????????????????my drbd.conf:# ———————-# You can find an example in /usr/share/doc/drbd…/drbd.conf.example#include "drbd.d/global_common.conf";#include "drbd.d/*.res";resource drbd-sr1 {protocol C;startup {}disk { max-bio-bvecs 1;}net {allow-two-primaries;cram-hmac-alg "sha1";shared-secret "password";after-sb-0pri discard-zero-changes;after-sb-1pri discard-secondary;after-sb-1pri consensus;after-sb-2pri disconnect;}syncer {rate 1G;}on STORAGE1 {device /dev/drbd1;disk /dev/sdd; # customise based on your drive layout see fdisk aboveaddress 10.0.2.1:7789;meta-disk internal;}on STORAGE2 {device /dev/drbd1;disk /dev/sdd; # customise based on your drive layout see fdisk aboveaddress 10.0.2.2:7789;meta-disk internal;}}resource drbd-sr2 {protocol C;startup {}disk { max-bio-bvecs 1;}net {allow-two-primaries;cram-hmac-alg "sha1";shared-secret "password";after-sb-0pri discard-zero-changes;after-sb-1pri discard-secondary;after-sb-1pri consensus;after-sb-2pri disconnect;}syncer {rate 1G;}on STORAGE1 {device /dev/drbd2;disk /dev/sdc; # customise based on your drive layout see fdisk aboveaddress 10.0.2.1:7790;meta-disk internal;}on STORAGE2 {device /dev/drbd2;disk /dev/sdc; # customise based on your drive layout see fdisk aboveaddress 10.0.2.2:7790;meta-disk internal;}}# ——————–

  41. Hi Joewhat to do here??[root@STORAGE2 ~]# xe sr-create shared=true device-config:device="/dev/drbd1" name-label="DRBD-SR1" type=lvmThe SR operation cannot be performed because a device underlying the SR is in use by the host.

  42. Are you trying to create a second SR using the DRBD-SR2 resource?"xe sr-create shared=true device-config:device="/dev/drbd1" name-label="DRBD-SR1" type=lvm"It should be /dev/drbd2

  43. [root@xenserver-drbd2 ~]# xe sr-list params=alluuid ( RO) : 2d0b87bf-b33f-af65-c819-62c4dfd5cd92 name-label ( RW): XenServer Tools name-description ( RW): XenServer Tools ISOs host ( RO): xenserver-drbd2 allowed-operations (SRO): forget; plug; destroy; scan; VDI.clone; unplug current-operations (SRO): VDIs (SRO): 21314484-e70c-44c6-bc68-c680dc3288c6; 57e9c116-ab93-4c30-9cae-2182ed506a68 PBDs (SRO): 829f5f52-9860-6e23-c8ca-19207207fa33 virtual-allocation ( RO): 0 physical-utilisation ( RO): -1 physical-size ( RO): -1 type ( RO): iso content-type ( RO): iso shared ( RW): true other-config (MRW): xensource_internal: true; xenserver_tools_sr: true; i18n-key: xenserver-tools; i18n-original-value-name_label: XenServer Tools; i18n-original-value-name_description: XenServer Tools ISOs sm-config (MRO): blobs ( RO):uuid ( RO) : 8f565374-158c-3515-c507-554c9abc66ca name-label ( RW): Local storage name-description ( RW): host ( RO): xenserver-drbd2 allowed-operations (SRO): forget; VDI.create; VDI.snapshot; plug; update; destroy; VDI.destroy; scan; VDI.clone; VDI.resize; unplug current-operations (SRO): VDIs (SRO): 0fc6af92-a604-4447-964b-f468669571be PBDs (SRO): 447c9155-b7c2-76ad-f498-37769c24299d virtual-allocation ( RO): 16777216 physical-utilisation ( RO): 20971520 physical-size ( RO): 71420608512 type ( RO): lvm content-type ( RO): user shared ( RW): false other-config (MRW): i18n-original-value-name_label: Local storage; i18n-key: local-storage sm-config (MRO): allocation: thick; use_vhd: true; devserial: scsi-SATA_INTEL_SSDSA2M08CVPO94330043080BGN_ blobs ( RO):uuid ( RO) : f47b9a7f-11e6-d3aa-95ef-2ba4b1412788 name-label ( RW): Removable storage name-description ( RW): host ( RO): xenserver-drbd2 allowed-operations (SRO): forget; VDI.introduce; plug; update; destroy; scan; VDI.clone; unplug current-operations (SRO): VDIs (SRO): PBDs (SRO): 0db74d8f-a531-a6a0-c6bf-4c7551919ae1 virtual-allocation ( RO): 0 physical-utilisation ( RO): 0 physical-size ( RO): 0 type ( RO): udev content-type ( RO): disk shared ( RW): false other-config (MRW): i18n-original-value-name_label: Removable storage; i18n-key: local-hotplug-disk sm-config (MRO): type: block blobs ( RO):uuid ( RO) : 989d61cf-9f40-2391-7655-e9a28106d25a name-label ( RW): DVD drives name-description ( RW): Physical DVD drives host ( RO): xenserver-drbd2 allowed-operations (SRO): forget; VDI.introduce; plug; update; destroy; scan; VDI.clone; unplug current-operations (SRO): VDIs (SRO): PBDs (SRO): 8e6486fc-02f5-8223-c645-53ff852ee217 virtual-allocation ( RO): 0 physical-utilisation ( RO): 0 physical-size ( RO): 0 type ( RO): udev content-type ( RO): iso shared ( RW): false other-config (MRW): i18n-original-value-name_description: Physical DVD drives; i18n-original-value-name_label: DVD drives; i18n-key: local-hotplug-cd sm-config (MRO): type: cd blobs ( RO):

  44. [root@xenserver-drbd2 ~]# xe sr-listuuid ( RO) : 2d0b87bf-b33f-af65-c819-62c4dfd5cd92 name-label ( RW): XenServer Tools name-description ( RW): XenServer Tools ISOs host ( RO): xenserver-drbd2 type ( RO): iso content-type ( RO): isouuid ( RO) : 8f565374-158c-3515-c507-554c9abc66ca name-label ( RW): Local storage name-description ( RW): host ( RO): xenserver-drbd2 type ( RO): lvm content-type ( RO): useruuid ( RO) : f47b9a7f-11e6-d3aa-95ef-2ba4b1412788 name-label ( RW): Removable storage name-description ( RW): host ( RO): xenserver-drbd2 type ( RO): udev content-type ( RO): diskuuid ( RO) : 989d61cf-9f40-2391-7655-e9a28106d25a name-label ( RW): DVD drives name-description ( RW): Physical DVD drives host ( RO): xenserver-drbd2 type ( RO): udev content-type ( RO): iso

  45. fdisk -lDisk /dev/sda: 80.0 GB, 80026361856 bytes255 heads, 63 sectors/track, 9729 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System/dev/sda1 * 1 523 4194304 83 LinuxPartition 1 does not end on cylinder boundary./dev/sda2 523 1045 4194304 83 Linux/dev/sda3 1045 9729 69759553 8e Linux LVMDisk /dev/sdb: 80.0 GB, 80026361856 bytes255 heads, 63 sectors/track, 9729 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytesDisk /dev/sdb doesn’t contain a valid partition tableWARNING: The size of this disk is 3.7 TB (3747348480000 bytes).DOS partition table format can not be used on drives for volumeslarger than 2.2 TB (2199023255040 bytes). Use parted(1) and GUIDpartition table format (GPT).Disk /dev/sdc: 3747.3 GB, 3747348480000 bytes255 heads, 63 sectors/track, 455589 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System/dev/sdc1 1 267349 2147480811 8e Linux LVMWARNING: The size of this disk is 10.0 TB (9985798963200 bytes).DOS partition table format can not be used on drives for volumeslarger than 2.2 TB (2199023255040 bytes). Use parted(1) and GUIDpartition table format (GPT).Disk /dev/sdd: 9985.7 GB, 9985798963200 bytes255 heads, 63 sectors/track, 1214037 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System/dev/sdd1 1 267349 2147480811 8e Linux LVM

  46. Hi Joe – i think i found some of the problem… in the lvm part of the guide:—————————————————————# By default we accept every block device:filter = [ "r|/dev/xvd.|", "r|/dev/sda.*|" ]#Set the write cache state to 0 default is 1write_cache_state = 0——————————————————————i think i misunderstood the guide, to type in the filer like thisfilter = [ "r|/dev/xvd.|", "r|/dev/sda.*|", "r|/dev/sdc.*|", "r|/dev/sdd.*|" ]that removes the drive from lvm right or does it include?

  47. i discovered another reason for butchs problem (only one node can start vms, no migration): In my case pvscan showed /dev/drbd1 as pv while vgdisplay -v showed /dev/sda3 as the pv. Solved this by editing /etc/lvm/cache/.cache. Removed all references to sda3 and added one for drbd1 and now all works fine! Shouldnt this file be wiped and re-created on a pvscan? in my case it was created on the xenserver-install and never actualized after…. a bug?

  48. Hello Joe, I also am getting Error code: SR_BACKEND_FAILURE_181Error parameters: , Error in Metadata volume operation for SR. [opterr=Error introducing Metadata Volume], I tried running fsck -f -y /dev/sdb on each server after fdisk as you instructed Vasko on April 25th. I get "Bad Magic Number in super-block while trying to open /dev/sdb" on both servers.I did make the mistake of entering the commands you instructed to be used After Full Sync before the sync had finished when first setting this up. I did not realize it was still syncing. I do not know if this caused the problem or not. Is there a fix for this you can help with?

  49. In reference to my above post this is what I tried on both servers. I get identical results on both servers:drbdadm down allfdisk /dev/sdbd1wqDoes this sync the disk or tell you to reboot before use?If sync worked okfdisk /dev/sdbnp1choose defaultswq$ fsck -f -y /dev/sdbI get the Bad Magic Number in Super Block error here. I tried e2fsck -b 8193 after that and get another error similar to the first suggesting that the file system is either not an ext2 file system or this super block is corrupt.

  50. Hi Paul & Timtry this guide and add primary primary after you have tested it works. This guide works for me and i don’t get the SR_BACKEND error anymore :-)http://www.xen.co.il/index.php/en/xenserver-drbd

  51. Was in London last week so apologies for the late reply, I see a few people having issues creating the SR. I am rebuilding my test environment with 5.6 this week so I’ll do a video to explain the exact steps and how to resolve these errors. Morten that guide is great just be careful when adding the primary:primary when it works, the sr isn’t shared so may cause issues with live migration.

  52. I tried PF4’s suggestion, deleting etc/lvm/cache/.cache on both servers. I did not resync after that, I ran on Secondary$ drbdadm primary drbd-sr1Then on Primaryxe sr-create shared=true device-config:device="/dev/drbd1" name-label="DRBD-SR1" type=lvmI got the same ERROR: SR_BACKEND_FAILURE_181. I notice that etc/lvm/cache/.cache was recreated on the primary server when I ran xe sr-create shared=true device-config:device="/dev/drbd1" name-label="DRBD-SR1" type=lvmIt was not recreated the secondary server at that time.Running cat /proc/drbd on the primary produces shows cs:connected ro:primary/primary ds:uptodate/uptodate C r—-Identical results (at least as far as the first line returned) come from running cat /proc/drbd on the secondaryThe second line of text returned differs between primary and secondary.Do I need to start from clean disks to try this again? How can I be sure the disks are empty and ready for another try? Any other suggestions?

  53. a little hint for those of you who got it working: we encountered terrible write speed with the defaults in drbd.conf (25MB/s max on 2node, 2cpu,16core,adaptecraid5 3 disks,1gbit drbdlink). After adding rcvbuf-size=0 and sndbuf-size=0 speed went up to 90MB/s max…. thanx to Christian Balzer for this hint!

  54. Hi Joeyes iv’e done it with share=true, and it is working – as you can see in my conf i have two drbd (drbd0 and drbd1) i’m now as far the drbd1 is synced bit the other drbd0 is in standalone mode? and is not syncing – is that because drbd doesent support two disk doing two drbd replications?

  55. Wiped both sdb disks and reinstalled both Xen servers (5.6), everything in the setup went perfect, no more errors. Installing an OS in my first VM right now. Question: Will this setup fail-over in both directions, that is, can I run 2 VM’s on xen1 and 2 VM’s on xen2 and than have either fail-over to the other if one goes down? I realize my storage space will all be duplicated across both boxes, but this would allow me to use the processor and RAM in both boxes under normal conditions and only have one box try to run all 4 VM’s under failure conditions. Is this possible and does it make sense?

  56. Hi Paul, good to hear, I haven’t tested that scenario but should be ok in theory. I’ll see if I can test it out and report back.

  57. Hi Joe – have you tried pulling network from the "heartbeat/copy link" – if i do it in my setup both servers restarts? and end up in a standalone scenario, where i have to discard data on one of the boxes… that is not good 🙁

  58. Sure np :-)drbd.conf:# You can find an example in /usr/share/doc/drbd…/drbd.conf.exampleglobal {usage-count yes;}common {syncer { rate 1G; } }resource drbd0 { protocol C;disk { max-bio-bvecs 1; }net { allow-two-primaries; cram-hmac-alg "sha1"; shared-secret "password"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-1pri consensus; after-sb-2pri disconnect; sndbuf-size 0; rcvbuf-size 0; }startup { }handlers {split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # I’ve changed the notify.sh to receive alerts to XenCenter} # you can see the change belowon xenserver-drbd1 {device /dev/drbd0;disk /dev/sdd;address 10.0.2.1:7788;meta-disk internal;}on xenserver-drbd2 {device /dev/drbd0;disk /dev/sdd;address 10.0.2.2:7788;meta-disk internal;}}resource drbd1 { protocol C;disk { max-bio-bvecs 1; # you need this option for drbd to work correctly with Xen (thank to Lars Ellenberg for this) }net { allow-two-primaries; cram-hmac-alg "sha1"; shared-secret "password"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-1pri consensus; after-sb-2pri disconnect; sndbuf-size 0; rcvbuf-size 0; }startup { }handlers {split-brain "/usr/lib/drbd/notify-split-brain.sh root"; # I’ve changed the notify.sh to receive alerts to XenCenter} # you can see the change belowon xenserver-drbd1 {device /dev/drbd1;disk /dev/sdc;address 10.0.2.1:7789;meta-disk internal;}on xenserver-drbd2 {device /dev/drbd1;disk /dev/sdc;address 10.0.2.2:7789;meta-disk internal;}}

  59. Hi Joe – have you trried out pulling the replication link? it’s only the primary server or servers that restarts? very confusing failover? 😀

  60. Hi Joe here is the kernel log just before crash:Jun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: PingAck did not arrive in time.Jun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )Jun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: asender terminatedJun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: Terminating asender threadJun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: Creating new current UUIDJun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: short read expecting header on sock: r=-512Jun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: Connection closedJun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: conn( NetworkFailure -> Unconnected )Jun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: receiver terminatedJun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: Restarting receiver threadJun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: receiver (re)startedJun 24 10:24:15 xenserver-drbd2 kernel: block drbd1: conn( Unconnected -> WFConnection )Jun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: PingAck did not arrive in time.Jun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: peer( Secondary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )Jun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: asender terminatedJun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: Terminating asender threadJun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: Creating new current UUIDJun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: short read expecting header on sock: r=-512Jun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: Connection closedJun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: conn( NetworkFailure -> Unconnected )Jun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: receiver terminatedJun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: Restarting receiver threadJun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: receiver (re)startedJun 24 10:24:17 xenserver-drbd2 kernel: block drbd0: conn( Unconnected -> WFConnection )

  61. Mar 20, 2010 Karsten said…I tested with lvm cache on/off. With lvm-cache on, i have good performance, but Linbit recommends to turn it off.What do you mean by "lvm-cache on"? What parameter? Which config?

  62. Hi Tim,nano /etc/lvm/lvm.conf#Set the write cache state to 0 default is 1write_cache_state = 0Cheers, Joe

  63. hi joe,Before any, thank you for this procedure. I tried to set up this solution with 2 servers with xenserver 5.6 but for the creation of the storage, I have a problem, the storage is created but the second server is unplugged whereas first is connected. Do you know where could come the problem and especially how to solve it? thank you by advance.Anthony.

  64. Hi Joe,It’s a very good description, and I have question about this solution are you using this in a production environment, because I’ll have to implement this solution in a production environment.

  65. This is a nice tutotrial. Two questions rose quickly, Can a XenServer (balance-slb) bond be use on the xOver interface for better throughput? Or this is useless. Should We keep the "balance-slb" or use somethhing else?Since Dom0 use only one CPU, should we increase it to two for better performance. My cpu0 is always around 60-70%.Again great article. Martin

  66. Hi Joe,Silly question, but have you had any problems with the 5.6 RPMs?Mine have installed but no kernel module exists..[root@xenserver-drdb2 tmp]# rpm -ivh drbd-km-2.6.27.42_0.1.1.xs5.6.0.44.111158xen-8.3.7-12.i386.rpmPreparing… ########################################### [100%] package drbd-km-2.6.27.42_0.1.1.xs5.6.0.44.111158xen-8.3.7-12.i386 is already installed[root@xenserver-drdb2 tmp]# rpm -qa|grep -i drdb[root@xenserver-drdb2 tmp]#Thoughts to why?

  67. All, I am using Adaptec raid controllers (3405) and I’ve found some interesting behavior that may assist others here with this lvm issue (I too am find this a challenge of the _181 message).I’ve found that upon installation of Xenserver, partitions have been created in /dev/sda2, sda3, etc, in line with my underlying RAID configuration. Yet /dev/sdb and sdc are available but not configured.Had to manually delete in fdisk (fdisk /dev/sda, d 3, d 2, w) the two other partitions and then configure the other two disks.

  68. Hi Morten,I found your issue with "The SR operation cannot be performed because a device underlying the SR is in use by the host". You have storage configured on one or both servers at the Xen level.(assuming here all SR data can be deleted)1. List your xen SR list$ xe sr-list2. Remove your xen SR UUID that you found in the SR list$ xe sr-forget uuid=<your-UUID-found-from-above>3. Do this on both servers.then back to xe sr-create step.

  69. Haven’t tested it yet, Here is the changelog8.3.8 (api:88/proto:86-94)——– * Do not expose failed local READs to upper layers, regression introduced in 8.3.3 * Fixed support for devices with 4k hard sector size (again) * Fixed a potential Oops in the disconnect code * Fixed a race condition that could cause DRBD to consider the peers disk as Inconstent after resync instead of UpToDate (Bugz 271) * Fixed a reace condition that could cause DRBD to consider the peers disk as Outdated instead of Inconsistent during resync (Bugz 277) * Disallow to start a resync with invalidate / invalidate-remote when the source disk is not UpToDate * Forcing primary works now also for Consistent, not only for Outdated and Inconsistent (Bugz 266) * Improved robustness against corrupt or malicous sector addresses when receiving data * Added the initial-split-brain, it gets called also if the split-brain gets automatically resolved * Added the –assume-clean option for the resize command, it causes drbd to not resync the new storage after an online grow operation * drbdadm: Do not segfault if stacked-on-top-of refers to an undefined res * drbdadm: Do not consider configs with invalid after statements as invalid * drbdadm: Do not segfault if the peer’s proxy section is missing * drbdadm: Allow nullglob in include statement * drbdadm: Fixed the use of waitpid * init script: fix insserv headers (Debian 576901) * Gave the receiving code the ability to use multiple BIOs for writing a single data packet; now DRBD works with BIOs up to 32kByte also on LVM devices; from now on the use_bmbv config option does nothing * New command check-resize, that allows DRBD to detect offline resizing and to move internal meta-data accordingly * Added a control loop, that allows DRBD to find auto tune the resync speed, on connections with large queues (drbd-proxy) * –dry-run option for connect; disconnects after sync handshake * –overwrite-data-of-peer got an alias named –force * Improvements to crm-fence-peer * Fixed option parsing and stacking in snapshot-resync-target-lvm.sh * Compiles on 2.6.33 and 2.6.34

  70. Hi,I’m testing this version and it’s working ok, but I have different problem with XenServer a network interface.I’m create for drbd dedicated interfaces but If I reboot xenserver and run ifconfig xenbr2 – no see ip address, but I see in xencenter.Any sugestion why is that?

  71. Hi Martin,Did you ever work out how to add additional CPU capacity to dom-0?I’ve been maxed out now for over 30 minutes whilst installing Win Svr 2008 x64.

  72. Finally I’m doing my first attempt to follow this guide.Using xenserver 5.6 and running its DDK VM, I’m compiling the drbd RPMs.I got the RPMs now but doing the following sequence$ ./configure –enable-spec –with-km$ make tgzdoesn’t workI had to do this$ ./configure –enable-spec –with-km$ ./configure$ make tgzgiving a second standalone ./configure worked to create the drbd-8.3.8.tar.gzNext steps did work fine and now I’ve got a lot of DRBD RPMs in /usr/src/redhat/RPMS/i386/Is this correct or now I have some buggy RPMs?Thank youRobert

  73. Hi, thank you for your tutorial.I have two curiosities to ask you.First one is about the drive you use for drbd repository.In step 3. you say to fdisk the drive but without saying which partition type and in step 6. and 7. you talk about LVM.Does it mean that at step 3. you fdisk the whole drive with a unique LVM partition?If yes, what about the VMs? Let say I have 3 VMs, configured with 20GB storage for each one. Are you going to drbd the whole VMs repository (sda lvm drive/partition)? And, If yes, is it implicit that in case of failure I’ll be able to migrate all of them?The second one is about drbd config.I understand that your project is a "fail over" setup, and so I’d like to catch the meaning of the two primary setup, is it to have less console work to do on the secondary in case of VM migration or is it a must for the system to work?Or in other words, could it be used a drbd primary-secondary setup? This to prevent split-brain scenarios.Thank youIC

  74. Hi JoeHave you seen this when you try to pbd-plug a storage on a reinstalled xenserver??There was an SR backend failure.status: non-zero exitstdout: stderr: Traceback (most recent call last):File "/opt/xensource/sm/LVMSR", line 1449, in ?SRCommand.run(LVHDSR, DRIVER_INFO)File "/opt/xensource/sm/SRCommand.py", line 161, in runret = cmd.run(sr)File "/opt/xensource/sm/SRCommand.py", line 138, in runreturn sr.attach(self.params)File "/opt/xensource/sm/LVMSR", line 71, in wrapperret = op(self, *args)File "/opt/xensource/sm/LVMSR", line 327, in attachself._checkMetadataVolume(self.sm_config)File "/opt/xensource/sm/LVMSR", line 216, in _checkMetadataVolumeself._synchMetaData(map)File "/opt/xensource/sm/LVMSR", line 226, in _synchMetaDataxml = metadata.retrieveXMLfromFile(self.mdpath)File "/opt/xensource/sm/metadata.py", line 103, in retrieveXMLfromFile_testHdr(hdr)File "/opt/xensource/sm/metadata.py", line 40, in _testHdrassert(hdr[0] == HDR_STRING)AssertionError

  75. Sorry, maybe this is a silly question, but…how do you read the root mail on xenserver?It did happen a split brain and I’m wondering about if the mail was generated.Thank youR.

  76. Hi,I have two serverer with xenserver 5.6 and I using drbd 8.3.8.Bellow it’s my config:—————–global { usage-count yes; }common { protocol C; net { allow-two-primaries; cram-hmac-alg "sha1"; shared-secret "XenDrbdCitrix"; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; max-buffers 8192; max-epoch-size 8192; sndbuf-size 1024;# rcvbuf-size 0;}disk { max-bio-bvecs 1; no-md-flushes; no-disk-flushes; no-disk-barrier;}startup {become-primary-on both;}syncer {rate 1G;}}resource sr0 { on xenB1 {device /dev/drbd0;disk /dev/sdb1;address 1.1.1.1:7789;meta-disk internal;}on xenB2 {device /dev/drbd0;disk /dev/sdb1;address 1.1.1.2:7789;meta-disk internal;}}# ——————–I have some problem with this configuration, if I shutdown or reboot xen machine my drbd storage all time is broken and I must repair.Any sugestion where is the problem?

  77. XenMotionhave you ever tried drag one VM among pool servers?Like they do herehttp://community.citrix.com/display/ocb/2008/02/10/Everything+You+Always+Wanted+to+Know+about+XenMotion

  78. Hi Martin,that is also a mine question.Bond in xenserver is made in a some kind of customized mode and officially it is not supported to set the other modes.I’d like too to know if somebody used two "xenserver bonded" NICs for DRBD, mainly for redundance (yes ’cause 110MB/sec, the theoretical bandwidth of one 1000Mbit card would be enough as sync speed)Thank you all if you share your personal experienceR

  79. <strong>DRBD between sata II drives. Sync speed is good 80MB/sec, but…</strong>I excuse myself in advance because this question is a bit off topic.I’m testing DRBD with XenServer Citrix.My test environment is made with 2 sata II disks on each quad core pc. One drive is for xenserver installation the second is the VMs storage.DRDB first sync highlighted an average speed of 80MB/sec, fine.I installed a windows 7 32 bit guest and by the meaning of this kind free tool, portable edition (no installation needed)Disk Throughput Tester 2.00 R11 (build 208) developer website and download http://www.objectso.nl/downloads.htmlI measured the windows write to disk performance with a 512MB size test fileI made the test 3 times, with 64K, 128K and 256K block sizes.Write performaces are really poor64K – 10MB/sec128K – 16MB/sec256K – 18MB/secLooks like this is a common issue in VMs guests both windows and linux, but most of the times is it windows related.Do you have any experience to explain this issue? If you had it and in some way, you fixed it, would you like to kindly share the solution?If you never measured Windows guests write to disk performance, would you like to do the same test and share your results? This is obviously intended made with sata II drives.Thank you for any idea.Robert

  80. Hi Roberti’m expiriencing bad write also… iv’e tried almost everything now (i think) but i can’t get past 85-90mb/s in write with 10gbit ethernet link.

  81. There are a couple of comments regarding maxing out @ 100mb’s | Apologies for the delay I’m not ignoring you, My test setup is all based on standard SATA drives so I can put above 60/70mb/s I’m hoping to have a 6x 300GB 10K SAS system in the coming days so will have a chance to test then.Tks, Joe

  82. Hi Joe,one thing is the drbd benchmark: with sata drives and dedicated gigabit nic for drbd, sync speed is average 80MB/secBut another story is/are the virtual machines.Have you ever benchmarked any windows vm disk write speed? With citrix tools installed and without.Either if drbd sync speed is 80MB/sec, windows vm disk write speed is 20MB/sec (without tools) and only 10MB/sec with tools installed.Have you got any system (both linux and drbd) tuning tip to enhance such slowness?Thank youRobert

  83. P.S.For the windows vm disk read speed is the opposite:Without tools installed, read speed is average 50MB/sec,Installing the tools makes it jump to 80 up to 120MB/sec R.

  84. Hi Robert,Using #> /usr/bin/time dd if=/dev/zero of=zeroes bs=1024k count=1024I’m seeing similar performance in Linux + Xentools – 23mb/s average on a Single SATA disk. From dom0 I’m hitting 83mb/s—————————————But on a Server without DRBD and 8x 300GB 10K disk RAID 50 From dom0 = 300mb/sLinux + Xentools = 85mb/sRoughly the same overhead.

  85. Out there, they suggest to play with al-extents inside drbd.conf and, additionally, to place the drbd metadata on another physical disk (not suggested).About your tests, looks like that virtualized machines I/O is a bit less than 1/3 the speed against the non virtualized. Additionally, drbd slows reduces I/O speed of another 66% and a bit over.wow…not so good (at least when it is time to save datas on the vm)What is your opinion on this?(and, a bit off topic: about messages from split brain or any other services that should email to the root. What do you use as mail agent? SSMTP? Or have you installed some classic postfix or whatever else?)R.

  86. the tests I made for xenserver 5.6 windows vm guests disk I/O performances,have all been made with this dutch freeware utilityDisk Throughput Tester 2.00 R11 (build 208)http://www.objectso.nl/downloads.htmlusing a file size of 512MB and a block size of 512K, the rest of the settings have been left in default status.R.

  87. Beware about protocolWith protocol C, each next disk write, waits for the previous write to be completed on the secondary node.This is the safest approach in those systems without a write cache with a battery that saves it in the case of power failure.In the mean time, this also mean that your drbd system speed is fully dependent and related to the secondary node write speed.Again: you’d better forget about other two protocols if your hardware is not properly equipped.CiaoRobert

  88. Hi Joe & Robert – ive testet it with protocol a,b and c make no or little difference 🙁 – iv’e also messed with performance tuning max-buffers – al-exetends – disk performance settings in drbd.conf i think theres not much to do here – i can see that xenserver dom0 cpu is at a 100% when i do write tests..

  89. @ Morten: I’m missing your disks system configuration. Maybe better to know both servers disks configuration.Is it raid? Is it single disk?

  90. Hi Robert – its a raid system 10 2tb sata disk in a raid10 – runs pretty darn fast without drbd. and i also have a raid 5 with 750gb disk.

  91. Hi All – ive found the speed limit issue – if you have enought VM’s running for replication, Drbd off course uses a lot more cpu. The problem then is that xenserver i designed to only have one cpu for DOM0 witch is where drbd is running. så it will constantly lie on 100% and limit the processor power for the replication service. I almost bought Everrun witch is a paid solution for xenserver, the performance was better but have the same issue with DOM0 – hope that Citrix will release milticore support for DOM0 very soon, otherwise its convert to VMware 🙂

  92. Hello!Congratulations for your article.I’m using something similar, but with xen source.Reading this material came a doubt, maybe you can clarify.The cache has been disabled lvm, why? if not done in what may mean?You do not use file system for cluster? DRBD only the same?Thanks

  93. Hi Joe, I have had this setup running for wome time but I am having trouble with the DRBD crossover link going down and one of the servers rebooting. DRBD is configured to reboot if there is a nic hardware failurs. I thought it could be a bnx2 driver issue, but I have compiled and installed the latest driver and it still happens. I notice it only happens when you put the disk i/o under load in a domU this is not the case in dom0… any ideas… where to start looking?

  94. Hi Morten,I have it but there is a Kernel bug in XenServer 5.6FP1 that stops DRBD from operating. A couple of threads on the Citrix forums about it – Hopefully Citrix will release the fix (They have one internally)

  95. Thank you Joe and to every one participates in this project. It is grate and very helpful. I’ve been testing the latest xenserver release 5.6 fp1 but every time I try to create the drbd share storage the master senerver will restart automatically by it self. Is this due to the Kernel bugs in Xenserver 5.6fp1? -Sam

  96. First of all, you must beware that XenServer 5.6 FP1 is a mess by itself, read posts on Citrix forum. <br/> <br/>Next I suggest you to read also the list. <br/> <br/>My overview on this makes me observe that stable and tested 5.6 with 8.3.1 (Hi Joe, do you have any tip or enhancement that would suggest to upgrade 8.3.1?) is the safer choice and I’ll postpone any test at when 5.6 FP2 will be released. <br/> <br/>Robert <br/>Le mail ti raggiungono ovunque con BlackBerry® from Vodafone!

  97. Hello,if its possible to use the drbd-Device with XenHa ?I have two XenServers Platinium running with a Shared DRBD device but i am not able to Configure the build in HA.xe sr-list:[root@xenserver1 home]# xe sr-listuuid ( RO) : 5c749558-ff42-a290-e4d6-7e1f57292ff4 name-label ( RW): DRBD-SR1 name-description ( RW): host ( RO): <shared> type ( RO): lvm content-type ( RO):Thanks for Ur Answer 😉

  98. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html><head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head>Read posts above.<br/>Forget about FP1 and its self issues.<br/><br/>Better wait for FP2<br/>R.<p>Le mail ti raggiungono ovunque con BlackBerry® from Vodafone!</p></html>

  99. Hi Morten, compiling again now using the DDK, its looking good so far.Two additions the source WARN_ON_RATELIMIT(sioc, &sioc->rs); –> mm/page_alloc.cWARN_ON_ONCE(bs->bio_pool->curr_nr == 0); –> fs/bio.c

  100. Hi Joseph, I see you using the DDK for compiling the drivers, but it seems Linbit had already created rpm’s to be used for different Xenserver versions : http://www.linbit.com/support/drbd-8.3.10/Have you or anyone else tried these yet ? It seems a lot easier to use them instead of compiling them all yourself… It should even be possible to add them to a yum repo so upgrades for newer kernels work flawless…

  101. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html><head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head>Great,<br/>but I suggest and warn to read carefully citrix forums for people concerning about FP1 itself issues.<br/><br/>R.<p>Le mail ti raggiungono ovunque con BlackBerry® from Vodafone!</p></html>

  102. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <html><head> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"> </head>It is only an hardware matter.<br/><br/>Same cpu family and ram availability, theoretically should work. I studied it a bit a while ago, and it should work.<br/><br/>See http://www.fastec.eu/clusterbiz/<br/><br/>It is in italian, but should give you an idea.<br/><br/><br/><p>Le mail ti raggiungono ovunque con BlackBerry® from Vodafone!</p></html>

  103. Working here! Although using slightly different hardware we can at least use xenmotion with paravirtualized (PV)linux domUs. Windows (HVM) hosts crash while live migrating though and we encounter issues with the time source on live-migrated machines – using ntp is a workaround for this issue… <br/> <br/> <br/>Am 10.02.2011 08:53, schrieb Posterous:

  104. Hello,yes the drbd-utils are not in the package…so i can´t test it :(can you please upload the drbd-utils ?thanks for the nice tutorial 🙂

  105. Hi,thanks for the Download Links ! Now we tested XenServer 5.6.1 FP1 + DRBD on 2 IBM x3550 M2 Nodes. All works fine, Livemigration, XenMotion, Convert/Import VM´s with XenConvert etc…But the Problem is on my tests, when i disconnect node 1 from the network to simulate a Split-Brain, how can i get the VM´s working on node 2, i cant connect anymore to the pool or to the IP of node 2 with the xen center…Has anyone an idea?

  106. Try logging in using ssh and execute ‘xe pool-emergency-transition-to-master’ on node2… now it becomes master and you should be able to connect to it using XenCenter and of course start your vm’s on this node… Only just disconnecting your network doesn’t disable the vm’s on the other node so you might have a problem resyncing your drbd if you do it this way…

  107. Hi,thanks for the answer…i tried it out and it works. with the resync of drbd it´s no problem, that are my 2 testnodes. i only want to test everything so i´m prepared in case of an emergency 🙂

  108. Hi Joe,yesterday, one of my fiber switchs had a physical problem and the communication between both servers was broken. So, after restart the communication, both servers was on split-brain stats !! :-(So, in this moment (5 hours after the problem) I have both servers working and in both servers I have VMs running…The only solution for this problem is: – make a backup of all VMs in one servers – put DRBD node in this server as secondary – discard data in the secondary – force primary to overwrite the second – restore the backup????? :-(or exist other procedure to reconnect DRBD in XenServer after a split-brain ??? thanks and attentive.

  109. Hi Joe,XS5.6 SP2 was released two days ago.Do you know, if DRBD is working with that version?regards,loop

  110. Hello there,i´m too very interested if XS 5.6 SP2 is working with DRBD or if i have to compile a new version with the XenServer DDK VM…Regards.

  111. Hi Chris, Loop, Unfortunately 5.6 SP2 is using kernel 2.6.32.12-0.7.1 which doesn’t include DRBD by default.I will compile another RPM tomorrow and post it.Cheers, Joe

  112. That sounds great. Thank you !Do you think there is a way to update/upgrade from XS 5.6 fp1 DRBD with the latest Hotfixes to SP2?I´ve read from the CItrix Forums that the upgrade will not work also without DRBD on 5.6 fp1 to SP2.Best Regards.

  113. Hi Joe. I’m reproducing your setup in my environment, and we had a power outage last night. In order to recover my node (adding a secondary soon), I had to run$drbdadm primary <resource>do you think its safe to add:resource resource startup { become-primary-on both; } …}from: http://www.drbd.org/users-guide/s-enable-dual-primary.htmlI'll have 2 drbd SRs, one for the VMs on each server in the pool. Maybe it would be safer to run primary/secondary. Thoughts?

  114. I’m thinking that, as I have a DRBD SR for each server, it makes more sense to allow 2 primaries, but to have my servers boot as primary/secondary and secondary/primary and only to invoke dual primaries if I need to run maintenance on a server or recover from a hardware failure. To me this seems like a more stable environment, no?

  115. Hello Joe. Very good manual. Sorry for my English. I’m from Argentina and I have the following problem. When I start installing VMs, drbd status changes from UpToDate / UpToDate to UpToDate / unknown. I go back to configure, but when I install the VM, is the same. My configuration is: 1) 2 Host with XenServer 5.6. 2) DRBD 8.3.7 What you’re missing? Greetings and thank you very much

  116. Hi Joe.I compile rpm in latest version xen server 62.6.32.12-0.7.1.xs6.0.0.531.170662xenrun rpmbuild -bb drbd.spec good workbut run rpmbuild -bb drbd-km.spec get error+ test -d /lib/modules/2.6.32.12-0.7.1.xs6.0.0.531.170662xen/build/.++ KDIR=/lib/modules/2.6.32.12-0.7.1.xs6.0.0.531.170662xen/build++ scripts/get_uts_release.sh+ test ” = 2.6.32.12-0.7.1.xs6.0.0.531.170662xenerror: Bad exit status from /var/tmp/rpm-tmp.38646 (%prep)Are you can help me?

  117. Hi,I cannot get my cluster to primary/primary. Every time I run drbdadm primary drbd-sr1 I get this error Command ‘drbdsetup primary 1’ did not terminate within 121 secondsAny idea? Thanks

  118. It works fine in primary/secondary mode but not in primary/primary. When I run the command drbdadm primary drbd-sr1 on the secondary node I get primary/unknown Any idea?Thanks

  119. My problem with primary/unknown was fixed by using drbd-8-4-1-1.and xensever 6.1 it seems an issue with drbd-8-4-2-2, did not work with xenserver 6.1Thanks

  120. Encountered Problem with xenserver 6.1 and drbd 8.4.3:# drbdadm wait-connect allunable to join drbd events multicast group#same applies to drbdadm wait-con-int which is used in startup script so drbd does not wait for second node on startup. There seem to occure other problems on initial connect in dual-primary-szenarios as well which always lead to split-brain-situation.When connected manually drbd seems to work correctly anyway…Has anyone seen (and solved) this problem?cu. Tim

  121. Hi Timthis issue sounds more as something directly involving DRDB itself, you’d better subscribe the DRBD mailing list or even search through its archives.If I do remember properly, I’ve seen already someone else asking on the list for the same issue.

  122. @SAMi have the same issue….but it also drops my whole network connection!!can someone provide rpm’s for drbd 8.4.1 and xenserver 6.1?ThanksMartin

  123. don’t even try 8.4.3 on xenserver 6.1 !!!8.4.1 works perfect maybe you can update your blog on this :)sharing the rpm files for the community:drbd-km-2.6.32.43_0.4.1.xs1.6.10.741.170752xen-8.4.1-1.i386.rpmdrbd-utils-8.4.1-1.i386.rpmhttp://www65.zippyshare.com/v/31108023/file.htmlhttp://www54.zippyshare.com/v/97353055/file.html

  124. Hi Martincan you kindly specify what you mean with "works perfect"?I mean, have you tested it in a production environment?Which kind of VMs are you running on it?Thank you for sharing your server’ real life experienceRobert

  125. Awesome guide Joe, however you might want to add:
    sm-config:allocation=thin
    add the end of the adding of the SR as now its thick 🙂
    ps:
    8.3.12 with XenServer 6.2 works fine dont try 8.4.X as it will split brain like crazy so still stay away from that!!

  126. Hello guys, I’m using xenserver 6.2, no updates or hot fixes installed with drbd 8.4.3 successfully synced in dual-primary mode. No split-brains at all, unlike Stuat said. But I got a “VDI is not available” when I try to start or live migrate a vm from my primary host. All my vas start without a problem on primary, but nothing on secondary. drbd show as up-to-date on both nodes. Anybody has a clue on this?

  127. Hi All,

    I found these error:

    configure: WARNING: you should use –build, –host, –target
    configure: WARNING: invalid host type: –enable-spec
    configure: WARNING: you should use –build, –host, –target
    configure: WARNING: invalid host type: –with-km
    checking for –enable-spec-gcc… no
    checking for gcc… no
    checking for –enable-spec-cc… no
    checking for cc… no
    checking for –enable-spec-cl.exe… no
    checking for cl.exe… no
    configure: error: in `/drbd/drbd-8.3.7′:
    configure: error: no acceptable C compiler found in $PATH
    See `config.log’ for more details.

    Anywa\y, I run those server through virtual machine.

Leave a Reply

Your email address will not be published. Required fields are marked *