Zero-Downtime Limited-Space Backup

Description

Overview

This is yet another backup script.  We use it internally, but we thought other organizations might find it useful.  It is based on using the python API and command-line LVM snapshots.

We wrote this because other methods didn't quite meet our needs:

1) No downtime - Doing exports requires a shutdown.

2) Limited space - Doing built-in snapshots of VM's was not feasible for us.  Currently, there is no way to exclude disks in a snapshot (that we have found).  A snapshot will take about double the currently used space for a disk on an SR, and this space cannot be reclaimed until the snapshot is deleted and the machine is shutdown to be coalesced.  In one of our VMs we have about 8 TB of user drive space, with no extra space on the SRs where the disks are allocated.  We don't have enough room, nor do we care about creating a snapshot with the user data since it is already backed up with netbackup.  The script allows us to get no-downtime snapshots of the system disks with only requiring a small and temporary amount of extra space on the SRs.

How it Works

The python API is used to gather metadata about the VM, its disks, and its network interfaces.  The metadata is written to plain text files.  The data from the disks is imaged by doing a dd on the lvm volumes that correspond to the VDIs for the disks.

To restore, a new VM is created and given the memory and CPUs settings stored in the metadata.  Then the VIF and disks are restored with the stored images being written to the new lvm volumes.

Work in Progress

The script is still a work in progress and, while it fits our needs, does not have all of the features it could (Comments welcome).

For instance, It does not currently work for HVM guests.  We used an ugly (but quick) kludge where, for a new VM that we are restoring, we just copy an existing VM that we know has booted before, but does not have any disks.  This allows us to avoid the VM create command (just look at it).

Setup

1) Create a config file like the example given changed for your environment.

2) (Kludge) Take a working linux vm you have now, copy it, and remove its disks.  Give it a unique name and put this name into NAUrestore.py under copy_vm.

Download

NAUBackup

Interesting Snapshot method.

Resetting the QSECOFR service tools user ID and password

Notes - If you know the password for the QSECOFR user profile, use this password to reset the password for the IBM-supplied service tools user ID that has service tools security privilege (QSECOFR) to the IBM-supplied default value.

Complete the following steps to reset the QSECOFR service tools user ID and password:

  1. Ensure that the system is in normal operating mode, not DST.
  2. Sign on at a workstation using the QSECOFR user profile.
  3. On a command line, enter CHGDSTPWD (Change IBM Service Tools Password). Then press F4 (Do not press Enter). You see the Change IBM Service Tools Password (CHGDSTPWD) display.
  4. Type *DEFAULT and press the Enter key. This sets the IBM-supplied service tools user ID that has service tools security privilege and its password to QSECOFR.
Attention: Do not leave the QSECOFR service tools user ID and password set to the default value. This is a security exposure because this is the value included in every system and is commonly known.

Why is DRBD write performance so low?

While setting up a new system I noticed that write performance is very low, only about 35MB/s. This is with 2.6.29.1 and DRBD 8.3.1. The disk device is a md device (software raid) and when I run the following test: dd if=/dev/zero of=/dev/md4 bs=64k count=100000 it will give me approx. 160 MB/s. If I do this against /dev/drbd0 the maximum I get is 70 MB/s when I disable DRBD on the secondary. The 70 MB/s with the secodary enabled I do manage also when I play with sndbuf-size, max-buffers, unplug-watermark and al-extents. But why is it limited at 70 MB/s when secondary is disabled? The drbd.conf looks as follows:

Interesting post on DRBD performance.