DRBD for Off-Site Disaster Recovery

on Ubuntu Hardy

Issue

A local machine (server) is to be provided with off-site disaster recovery (backup). The off-site location is accessible via the public Internet with reasonable bandwidth (100 Mbit/s).

DRBD is a distributed block level remote replication solution. Initially it was designed for machines sitting side by site with a dedicated network link between them. It does work well for remote setups like this one here as well, given proper tuning.

OpenVPN is 'the standard' VPN system for Linux it provides compressed VPN connections with minimal setup requirements.

Setup OpenVPN

DRBD does not provide data encryption and it needs a dedicated port for each replicated block-device. Using something like stunnel or even ssh is thinkable but if several devices are to be mirrored rather cumbersome. So I recommend setting up a dedicated VPN link.

Lets setup the server first

    server# apt-get install openvpn
    server# openvpn --genkey --secret /etc/openvpn/static.key
    server# cat <<CFG_END >/etc/openvpn/server.conf 
    dev tun0
    # 10.89.33.1 is server-vpn
    # 10.89.33.2 is backup-vpn
    ifconfig 10.89.33.1 10.89.33.2

    # Our pre-shared static key
    secret static.key

    # LZO compression
    comp-lzo

    # 3 -- medium output, good for normal operation.
    verb 3
    CFG_END
    server# /etc/init.d/openvpn start

Make sure that server port 1194 is accessible for UDP packets from the outside, you may have to tune the firewall.

The backup server setup is almost the same:

    backup# apt-get install openvpn
    backup# scp root@server.ip.address:/etc/openvpn/static.key /etc/openvpn/static.key
    backup# cat <<CFG_END >/etc/openvpn/server.conf 
    dev tun0
    remote server.ip.address

    # 10.89.33.1 is server-vpn
    # 10.89.33.2 is backup-vpn
    ifconfig 10.89.33.2 10.89.33.1

    # Our pre-shared static key
    secret static.key

    # LZO compression
    comp-lzo

    # 3 -- medium output, good for normal operation.
    verb 3
    CFG_END
    backup# /etc/init.d/openvpn start

Also depending on your NAT situation you may want to turn the backup machine into the vpn server. The difference between server and client setup is just the presence of the 'remote' keyword which determines who initiates the connection.

Setup DRBD

DRBD can work happily on pre-setup file systems. It sort of slides between the original device and the filesystem. All it needs on top of this is an additional block device to keep a bitmap for tracking which blocks have been modified and need to be synced to the destination host.

Lets setup a filesystem on the server as a starting point. This could be a pre-existing filesystem!

    server# lvcreate -L 1G -n test_a local
    server# mkfs.ext3 -m1 -b 4096 \
            -E resize=10T,stride=16,stripe-width=80,resize=6T \
            -O dir_index,filetype,has_journal,sparse_super \
            -L test_a /dev/local/test_a

A word on the filesystem creation parameters. The stride 16 derives from our RAID stripe size (64k) divided by the ext3 Block Size (4k). The stripe-width 80 is the number of data-bearing disk (for RAID6 this is number of disks in the RAID minus 2) multiplied with the stride. Our RAID6 here is made up from 7 disks (5 data disks).

Note that drbd configuration on 'server' and 'backup' is the same for the most part, what follows has to be done on both boxes.

First you need to get a copy of drbd 8.3.x on your box. This is not included in hardy, but you can make do with the Ubuntu HA PPA

    host# apt-get install drbd8 drbd8-utils

Each DRBD device requires space to keep tally of the blocks it has to sync. This space can be internal if you start from scratch, but for existing file systems it is much handier to keep this outside. By default, each block device needs another 128MB. Since normal partitions are way more than 1G in size this works quite well. There is a way to have a proportional bitmap size, but then you have to keep track of the block device size which makes resizing and such cumbersome to say the least.

    host# lvcreate -L 128M -n drbd_meta local

If you want to keep meta-data for a second DRBD device just enlarge the drbd_meta device:

    host# lvresize -L +128M /dev/local/drbd_meta

Now for the configuration of DRBD. Again, the configuration of server and backup is exactly the same. There is default config in /etc/drbd.d/global_common.conf which needs some tweaking.

    common {              
      protocol A;  # do not wait for the data to get to the remote host

      handlers { 
        .... # keep the default            
      }

      disk {
        # since we are on a RAID with battery backed
        # cache we can disable the flushing of disk and md
        no-disk-flushes;
        no-md-flushes;
      }
      net {
        # checksum the data
        cram-hmac-alg sha256;
        # authenticate
        shared-secret "some-shared-secret-of-your-choice";
        # keep a send buffer (10M is the max)
        sndbuf-size 10M;
        # 
      }
      syncer {
        # do not overwhelm our link ... this is 1 MByte/s
        rate 1M; 
        # when re-syncing, do not send the actual data
        csum-alg md5;
      }
    }

With the default settings in place, create a resource configuration for each resource you want to remote-backup.

    host# cat <<CFG_END >/etc/drbd.d/0_test_a.res
    resource test_a {
      device    /dev/drbd0;
      disk      /dev/local/test_a;
      meta-disk /dev/local/drbd_meta[0];
      on server {
        address   10.89.33.1:7789;
      }
      on backup {
        address   10.89.33.2:7789;
      }
      # if you have multiple resources
      # use this to tell the syncer to re-sync one after
      # the other
      syncer {
        # after another_resource;
      }
    }
    CFG_END

The connection between the two resources runs over the OpenVPN link established above. I am naming the DRBD resource the same the the underlying LVM virtual device. This helps to keep order when the number of replications increase. The only tricky bit here is that you have to keep a running count of all resources, this example here is about resource 0 ... If there was a second resource to it would get number 1 for the device number and the meta-disk index. Also you would have to make expand the meta-disk by 128M.

On the server things are now in readiness and you can activate the setup

    server# modprobe drbd
    server# drbdadm create-md test_a
    server# drbdadm attach test_a
    server# drbdadm syncer test_a
    server# drbdadm connect test_a

Now you can mount the DRBD device and start accessing it on the server again:

    server# mount /dev/drbd/by-disk/local/test_a /mnt/test_a

On the backup host, setup the disk space to receive the copy of the test_a device.

    backup# lvcreate -L 1G -n test_a local

And then activate as you did on the server

    backup# modprobe drbd
    backup# drbdadm create-md test_a
    backup# drbdadm attach test_a
    backup# drbdadm syncer test_a
    backup# drbdadm connect test_a

At any time, 'cat /proc/drbd' to see how things are standing. By now, both the server and the backup should stand in readiness, undecided about who is going to be the primary and who is the secondary. The primary will copy its state over to the secondary host as soon as this is decided. If you are working with 'real' data this is something you better be careful about. So in this example, do:

    server# drbdadm -- --overwrite-data-of-peer primary test_a

You can watch /proc/drbd to see how the syncing progresses. I also find the iftop tool a nice thing to keep tabs on the network traffic this generates.

Resize DRBD

If you ever run out of space on your test_a file system, it really pays of to be using lvm. All it takes to add space is:

    server# lvresize -L +10G /dev/local/test_a
    backup# lvresize -L +10G /dev/local/test_a
    server# drbdadm resize test_a
    server# resize2fs /dev/drbd/by-disk/local/test_a

all on-the-fly ... awsome !

Testing DRBD

With this setup in place you can start doing some testing:

What happens when the Internet link goes down
What happens when there is way more data written on the 1 MB/s link can carry.

We found in all the tests that the system behaves gracefully. It may go out of sync temporarily (as evidenced in /proc/drbd) but as soon as the link is re-established or the server activity lessens, the backup catches up and all is well again.

The only situation where this whole setup will fail is if the rate of change on the server is constantly higher than 1 MB/s or whatever sync rate you defined.

Recovery DRBD

You can easily simulate a disaster recovery scenario. Just down the OpenVPN tun0 interface.

    backup# ifconfig tun0 down

The two host will become separated

    server# cat /proc/drbd
    version: 8.3.7 (api:88/proto:86-91)
    GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@server, 2010-02-09 18:41:00
     0: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown A r----
        ns:184 nr:0 dw:2265076 dr:11114954 al:708 bm:983 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

    backup# cat /proc/drbd
    version: 8.3.7 (api:88/proto:86-91)
    GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@server, 2010-02-09 18:41:00
     0: cs:WFConnection ro:Secondary/Unknown ds:UpToDate/DUnknown A r----
        ns:0 nr:0 dw:12438428 dr:314233 al:11 bm:814 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

Both resources still have the same GIT-hash and are waiting for a connection (WFConnection). If nothing else happens, the server will continue to work just fine and will resynchronize as soon as tun0 comes backup.

    backup# ifconfig tun0 up

Note that if you stop OpenVPN completely (/etc/init.d/OpenVPN stop) then the tun0 interface will vanish and DRBD will go into standalone mode and stop waiting for a connection:

    backup# /etc/init.d/OpenVPN stop
     * Stopping virtual private network daemon.          [ OK ] 

    backup# cat /proc/drbd
    version: 8.3.7 (api:88/proto:86-91)
    GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@server, 2010-02-09 18:41:00
     0: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r----
        ns:0 nr:0 dw:12438428 dr:314233 al:11 bm:814 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

Even when you start OpenVPN again, DRBD will not reconnect automatically. You have to nudge it back into action.

    backup# /etc/init.d/openvpn start
     * Starting virtual private network daemon.         
     * client (OK)                                       [ OK ]

    backup# drbdadm connect all
    backup# cat /proc/drbd 
    version: 8.3.7 (api:88/proto:86-91)
    GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@server, 2010-02-09 18:41:00
     0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate A r----
        ns:0 nr:0 dw:12438428 dr:314233 al:11 bm:814 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

In a disaster case the backup host would be all that is left. To get access to the data on the backup host the test_a device has to be promoted to primary mode and potentially fscked.

    backup# drbdadm primary test_a

When doing recovery you may want to mount read-only just to make sure no one puts any real data here, but depending on the situation the copy can also used to continue normal operations.

    backup# mount -o ro /dev/drbd/by-disk/local/test_a /mnt

The real trouble starts when things are back to normal. In the simplest case no data has been modified on the backup host and the server comes back up. At first both the server and the backup host will take no notice, you can force them to by running.

    server# drbdadm connect test_a
    backup# drbdadm connect test_a

But they will refuse to reconnect since they have diverged so now you need to manually recover. If the backup host has been mounted read-only this is a simple case of demoting backup down to secondary state and reconnecting.

    backup# drbdadm secondary test_a
    backup# drbdadm -- --discard-my-data connect test_a
    server# drbdadm connect test_a

Syncing will be very fast and by the time you look at /proc/drbd it will in all likelihood already be over.

If changes have happened at both ends, you have to use a filesystem level sync tool like rsync to copy changes from 'backup' to 'server'.

If the server has no changes, another option would be to 'demote' the server to secondary state and then use DRBD to sync up. Once they are in sync you can flip the roles again and continue on the server.

NOTE: The content of this website is accessible with any browser. The graphical design though relies completely on CSS2 styles. If you see this text, this means that your browser does not support CSS2. Consider upgrading to a standard conformant browser like Mozilla Firefox, Opera, Safari or Konqueror for example.