Openfiler DRBD + HA

The scenario I will try to illustrate, is 2 Openfiler systems in a linux-ha environment. For simplicity we shall refer to these systems as B1 and B2. B2 is configured to work as a hot standby, constantly checking if B1 is up. If anything happens to B1, B2 comes up, taking over the services of B1. From the outside world, only one system is visible at any point in time. This is done via a virtual ip address that is always assigned to one server. If anything happens to B1, and B2 decides to take over, it sends an ARP broadcast to the switch saying, hey, B1's mac just changes, I am B1. More about this can be read at http://www.linux-ha.org These are the steps I have outlined in my notes to get things going (these are bare instructions, please read the fine manuals) : 1.) Prepare your hardware. My systems have 3 Nics. 1 10/100 for the WAN, 1 Gigabit card for DRBD sync, 1 10/100 for running daily backups of the system files. DRBD is what will replicate your volumes. I used a crossover cable between the two cards, but you can always use a switch. 2.) Install Openfiler on both systems, do not do any LVM setups in Diskdruid, just leave the free space as is, you will come back to it later manually. 2a.) Setup ip networks, naturally separate on each NIC. 2b.) Most of you will have your WAN turned off, but if for any reason you need your WAN NICs on, make sure your iptables rules are set. Once all of your system is setup, and you can communicate on all interfaces, you can go on to setting up DRBD. 3.) DRBD is setup from one conf file, /etc/drbd.conf. Here is an example: ############## resource r0 { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { wfc-timeout 10; degr-wfc-timeout 120; } disk { on-io-error detach; } net { timeout 60; connect-int 10; ping-int 10; #max-buffers 2048; max-epoch-size 2048; } syncer { rate 100M; group 1; #al-extents 257; } on B1 { device /dev/drbd0; disk /dev/hda4; address 192.168.30.30:7788; meta-disk internal; } on B2 { device /dev/drbd0; disk /dev/hda4; address 192.168.30.31:7788; meta-disk internal; } } ######################### All of these config options are described in detail in the original install file. Please also pay attention to the version of DRBD you have installed (rpm -qa | grep drbd). Configs have changed since 0.6. Once you have been successfull at configuring both sides, start drbd via /etc/rc.d/init.d/drbd. You can watch the status of your sync via "cat /proc/drbd". That will tell you what is the status of DRBD. Most likely you will be in Secondary/Secondary mode, which means that neither one of B1 or B2 is able to access the share. Remember, only one side can be primary at a time. Drbdadm is the tool to control the state of your DRBD. By issuing the following command : "drbdadm primary r0", you will bring B1 or B2 to primary mode. Once the state is considered primary, you can start accessing it from that server. You should practice setting your share to primary and secondary state on both sides. Issue the following commands: B1: drbdadm secondary r0 B2: drbdadm primary r0 B2: cat /proc/drbd B2: drbdadm secondary r0 B2: cat /proc/drbd B1: cat /proc/drbd B1: drbdadm primary r0 B1 cat /proc/drbd If you have been successful, you should see the state changes from one side to another. Now, lets go on to creating LVM volumes and groups. 4.) Bring both B1 and B2 to state Sec/Sec (DRBD) LVM HOWTO: http://www.tldp.org/HOWTO/LVM-HOWTO/index.html 4a. Bring B1 into primary 4b. pvcreate /dev/drbd0 If you say /dev/hda* you are shooting yourself in the foot. 4c. vgcreate openfiler /dev/drbd0 activate via vgchange -a y openfiler Now, you should be able to visit http://blabla:446 login, click volumes and vuala. You should see your volume. Do not create anything at this time. 4d. Put B1 in secondary. deactivate lvm, vgchange -a n openfiler 4e. Put B2 in primary. 4f. pvcreate /dev/drbd0 4g. vgcreate openfiler /dev/drbd0 activate lvm, vgchange -a y openfiler You should be able to see your volume on B2 now (via the web interface). Now here is what I put into my .bashrc file on both systems: alias prim="drbdadm primary r0 && vgchange -a y openfiler" alias sec="vgchange -a n openfiler && drbdadm secondary r0" alias disp="cat /proc/drbd" This way you can just type "disp" to see the status, and the logic behind failing over is this: When you execute "sec", your volume group deactivates and drbd goes secondary after that to ensure no writing. On the other side, when you issue "prim", the DRBD network drive comes up to primary, then the volume group gets activated. Now, when you create your volume in Openfiler, you will see that you can no longer "go secondary", it will say something like : vgchange -- can't deactivate volume group "openfiler" with 1 open logical volume I have the question into LVM right now, I will update you guys further. Also need to figure our how to setup heartbeat, but I hope the above cleared up some gray areas some of you could have had. I am terrible at writing manuals, so if you want help, come to the IRC chan on FreeNode, #openfiler, I should be able to help!

Notes