Showing posts with label how to. Show all posts
Showing posts with label how to. Show all posts

Wednesday, March 13, 2019

Using LTE for Out of Band

It's generally good practice to make sure that any network you're responsible for maintaining can be reached in the event of a failure of either the main Internet (transit) connection, or failure (or misconfiguration) of the routing equipment. Sometimes it's not feasible to have a second transit connection or redundant networking hardware, and so you need to get creative.

One of my clients is a not-for-profit with constrained financial resources. We wanted to have a way in to the network in the event of a failure of the main router, or in case someone (likely me) fat-fingers something and breaks the router config. And, while having a second transit connection would be nice, it's just not something we can fit in the budget at the moment.

So, we had to get creative.

Before I came on board, they had purchased a consumer-grade LTE modem with the intention of using that as the backup access into the network, but hadn't actually set it up yet. This blog post covers the steps I took to get it working.

Overview

The data centre in question is in the US, so we're using a simple T-mobile pay-as-you-go data service. This service is designed for outgoing connections, and doesn't provide a publicly-reachable IP address that I could ssh to from outside the LTE network, so I need to set up some sort of tunnel to give me an endpoint on the Internet I can connect to that gets leads inside the client's network. ssh itself is the obvious choice to set up that tunnel.

I've set the tunnel up to provide access to one of the client's administrative hosts, which has serial access to about half the network equipment (including the main router). From that vantage point I should be able to fix most configuration issues that would prevent me from accessing the network through the normal transit connection, and can troubleshoot upstream transit problems as if I were standing there in the data centre.

The modem can be put into bridge mode, but can still have an IP address to manage its configuration. The LTE network wants to use DHCP to give our server an address. So, we'll have the slightly unusual configuration of having both a static and DHCP address on the server interface that the modem is connected to. The server has other duties though, so we'll have to make sure that things like the default route and DNS configuration aren't overwritten; that requires some extra changes to the DHCP client config.

And finally, for the tunnel to work we need a host somewhere out on the Internet that we can still reach when the 'home' network goes down. In the rest of this post I'm going to refer to our local administrative host as HOST_A and the remote host we're using for a tunnel endpoint as HOST_B. We'll need some static routes on HOST_A that send all traffic for HOST_B through the LTE network, and then we can construct the ssh tunnel which we'll use to proxy back into HOST_A.

Setting up the Modem

The modem we're using is a Netgear LB2120 LTE Modem with an external antenna, to get around any potential interference from the cabinet itself, or the computer equipment and wiring inside. We have pretty good reception (4-5 bars) from just placing the antenna on top of the cabinet.

The modem's LAN port is connected directly to an ethernet port on HOST_A. We could also have run that connection through a VLAN on our switches, but since the router and the server are in the same cabinet that would only serve to increase the possible ways this could fail, while providing no benefit. The main point here is that the router is going to provide its own network, so it's best not to have it on the same physical network (or VLAN) with other traffic.

This modem is designed to be able to take over in the event of the failure of a terrestrial network, which is what the WAN port is used for. But we don't want to use that here, so that port is left empty.

Connect to the modem's web interface (for this model, the default IP address and password are printed on the back).

In the Settings:Mobile tab, take a look a the APN details. This probably defaults to IPv4 only, so if you want to try to get IPv6 working (more on that later) you'll have to update the PDP and PDP Roaming configuration here. In the Advanced tab, you want to put the modem into Bridge mode (which will also disable the DHCP server), and you may want to give it a different static address. The modem's default network overlaps with private address space we already use, so I'm going to use 172.16.0.0/30 as an example point-to-point network to communicate with the modem. For that, you'd set the modem's IP address to 172.16.0.1 and its netmask to 255.255.255.252. Once you submit the configuration changes, the modem should restart.

Setting up the Server

The server needs to have a static IP address on the point-to-point network for configuring the modem as well as a DHCP address assigned by the LTE network. Because we may want to bring these up and down separately, I suggest putting the DHCP address on a virtual interface. You also need to configure a static route on the DHCP-assigned interface that points to HOST_B, so that any outbound traffic from HOST_A to HOST_B goes across the LTE network instead of using your normal Internet links. On a Debian host, /etc/network/interfaces.d/LTE.conf might look something like this:

auto eth3
iface eth3 inet static
 address 172.16.0.2/30

auto eth3:0
iface eth3:0 inet dhcp
 post-up ip route add 192.0.2.1/32 dev eth3:0
 post-down ip route del 192.0.2.1/32 dev eth3:0

You'll also need to modify /etc/dhcp/dhclient.conf to disable some of the changes that it normally makes to the system. The default request sent by the Debian dhclient includes the following options:

request subnet-mask, broadcast-address, time-offset, routers,
   domain-name, domain-name-servers, domain-search, host-name,
   dhcp6.name-servers, dhcp6.domain-search, dhcp6.fqdn, dhcp6.sntp-servers,
   netbios-name-servers, netbios-scope, interface-mtu,
   rfc3442-classless-static-routes, ntp-servers;

I've modified ours to remove the routers, domain-name, domain-name-servers, domain-search, dhcp6.name-servers, dhcp6.domain-search, dhcp6.fqdn, and dhcp6.sntp-servers options. You also need to block changes to /etc/resolv.conf. Even though you've told dhclient not to request those options, the server may still supply them and dhclient will happily apply them unless you explicitly tell it not to.

request subnet-mask, broadcast-address, time-offset, host-name,
    netbios-name-servers, netbios-scope, interface-mtu,
    rfc3442-classless-static-routes;

supersede domain-name "example.net";
supersede domain-name-servers 198.51.100.1, 198.51.100.2;

Setting up the Tunnel

For this, you want to create an unprivileged user that doesn't have access to anything sensitive. For the purposes of this post I'll call the user 'workhorse'. Set up the workhorse user on both hosts; generate an SSH key without a passphrase for that user on HOST_A, and put the public half in the workhorse user's authorized_keys file on HOST_B.

We're going to use SSH to set up the tunnel, but we need something to maintain the tunnel in the event it drops for some reason. There is a handy programme called autossh which does the job well. In addition to setting up the tunnel we need for access to HOST_A, it will also set up an additional tunnel that it uses to echo data back and forth between HOST_A and HOST_B to monitor its own connectivity, and restart the tunnel if necessary. We can combine that monitor with SSH's own ServerAliveInterval and ServerAliveCountMax settings to be pretty sure that the tunnel will be up unless there's a serious problem with the LTE network or modem.

I've chosen to run autossh from cron on every reboot, so I created an /etc/cron.d/ssh-tunnel file on HOST_A that looks like this:

@reboot workhorse autossh -f -M 20000 -qN4 -o "ServerAliveInterval 60" -o "ServerAliveCountMax 3" -R '*:20022:localhost:22' HOST_B

The -f option backgrounds autossh-M 20000 sets up a listening port at HOST_B:20000 which sends data back to HOST_A:20001 for autossh to use to monitor the connection. You can explicitly specify the HOST_A port as well, if you prefer. The remaining options are standard ssh options which autossh passes on. Note that in my case HOST_B has an IPv6 address, but I haven't configured the tunnel interface for IPv6, so I'm forcing ssh to use IPv4.

You may need to modify the sshd_config on HOST_B to set GatewayPorts yes, depending on the default configuration. Otherwise you won't get a remotely accessible port on HOST_B.

Instead of using cron, you could also use something like supervisord or systemd to start (and re-start if necessary) the autossh process.

Using the Setup

Once this is all put together, you should be able to ssh to port 20022 on HOST_B, and wind up with a shell on HOST_A.

matt@chani.conundrum.com:~
16:05:58 (3130) % ssh -p 20022 HOST_B
The authenticity of host '[HOST_B]:20022 ([192.0.2.1]:20022)' can't be established.
ECDSA key fingerprint is SHA256:4v+NbLg2QYqe43WFR9QKXaVwCpcc71u5jJmxJdZVITQ.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[HOST_B]:20022,[192.0.2.1]:20022' (ECDSA) to the list of known hosts.
Linux HOST_A 4.9.0-6-amd64 x86_64 GNU/Linux

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed Mar 13 18:02:09 2019 from 216.235.10.40

matt@HOST_A:~
20:05:59 (618) %

Why no IPv6?

T-Mobile support IPv6 on their LTE networks, so I have the APN for our modem set to IPV4V6 PDP. The server configuration has been a problem, however.

As with IPv4, we don't want to get a default route for our LTE network because that would interfere with the normal traffic of the server. It seems like disabling the acceptance of Router Advertisement (RA) messages should be all that's necessary, but for some reason that entirely disables SLAAC address assignment.

% cat /etc/network/interfaces.d/LTE.conf

auto eth3
iface eth3 inet static
 address 172.16.0.2/30

auto eth3:0
iface eth3:0 inet dhcp
 post-up ip route add 192.0.2.1/32 dev eth3:0
 post-down ip route del 192.0.2.1/32 dev eth3:0

iface eth3:0 inet6 auto
 pre-up /sbin/sysctl -w net.ipv6.conf.eth3.accept_ra=0
 post-up ip route add 2001:db8::1//128 dev eth3:0
 post-down ip route del 2001:db8::1/128 dev eth3:0

I have also tried using DHCPv6 (iface eth3:0 inet6 dhcp, above) but that also fails to get the configuration I want, and also causes ifup to return a fail condition when configuring the interface. At least the above SLAAC problem has the feature of failing silently, so I can leave the configuration in place without causing problems with interface management.

Perhaps you can find the right combination of options to make it work!  I invite you to follow up, if you do.

Good luck!

Saturday, February 27, 2016

Installing FreeNAS 9.3 Over the Network

As users of new Skylake (LGA1151) systems are discovering, Intel has completely removed EHCI support from the new architecture. XHCI (USB 3.0) is supposed to be completely backwards compatible to USB 2.0, but the lack of EHCI support has some less than pleasant effects on trying to boot from USB using any OS that is expecting USB 2.0 support. Specifically, this means that GRUB 2 cannot currently boot an OS on XHCI-only systems, which makes installing FreeNAS a bit of a pain.

The symptom of this problem is that on XHCI systems the boot process will proceed up to the point where it tries to mount the root filesystem, and then it will die with an "error 19".
Trying to mount root from cd9660:/dev/iso9660/FreeNAS_INSTALL []...
mountroot: waiting for device /dev/iso9660/FreeNAS_INSTALL ...
Mounting from cd9660:/dev/iso9660/FreeNAS_INSTALL failed with error 19.

This is actually a problem that affects all XHCI systems, but if your system supports both EHCI and XHCI, you can disable XHCI in the BIOS to make USB booting work. Skylake systems, however, have no EHCI support at all, not even on the USB 2.0 motherboard headers, so this workaround isn't available.

Some people have found success with PCI cards that add EHCI USB ports, but you have to use caution with this approach since many (most?) PCI USB cards don't provide bootable USB ports. I didn't want to have to go pick up extra hardware just to install the OS, so I've opted for another approach: load the installer over the network via PXE.

The FreeNAS developers use PXE booting when testing new builds, and there is a guide for doing this with FreeNAS 9.2. However, the guide is two years old and I found it to be missing several steps when trying to apply it to a current version of FreeNAS. It's even worse when trying to use current versions of the FreeNAS developers' tools, as they're completely missing large sections of setup instruction (they're clearly not intended for use outside the project).

So, I'm publishing an update to the guide here. Eventually this will be out of date too, but hopefully it will save someone time down the road.

If you want to follow this guide you will need:
  1. a FreeBSD server which will be your PXE and DHCP server
  2. a machine you want to install FreeNAS on (presumably you already have this, since you're reading this guide)

Set up the BIOS

You'll want to modify your system BIOS boot order on the NAS host to make sure that PXE (or Network) boot is enabled, and will be attempted before any other valid boot option (e.g. if there's an OS on any disk in your system, that disk should be ordered after the PXE boot). Exactly how you do this is going to be specific to your BIOS.

Setting up the DHCP Server

Install the isc-dhcp43-server package, and use a config file that looks mostly like the following.  Update it for the subnet you use on your network:  "next-server" should be the IP address of your PXE server.
subnet 192.168.57.0 netmask 255.255.255.0 {
    range 192.168.57.100 192.168.57.200;
    option subnet-mask 255.255.255.0;
    option routers 192.168.57.1;
    option broadcast-address 192.168.67.255;
    option domain-name-servers 192.168.57.1;
    option domain-name "mydomain.com";

    next-server 192.168.57.10;
    filename "boot/pxeboot";
    option root-path "/tftpboot/installer/";
}

Prepare the Installer

You need a copy of the FreeNAS installer ISO coped out onto the PXE server's filesystem. The following pair of commands will get the current version I'm using:
mkdir -p /tftpboot/installer
fetch -o - http://download.freenas.org/9.3.1/latest/x64/FreeNAS-9.3-STABLE-201602031011.iso | bsdtar -x -f - -C /tftpboot/installer/

Set up NFS

First, permit the installer you just set up to be exported, and start up NFS.
echo '/tftpboot -ro -alldirs' >> /etc/exports
echo 'nfs_server_enable="YES"' >> /etc/rc.conf
service nfsd start
Next, instruct the installer to mount its root filesystem from the NFS export you just setup. Be sure to set the hostname of your pxeserver (or its IP address) correctly in the fstab entry.
mkdir /tftpboot/installer/etc
echo 'pxeserver:/tftpboot/installer   /   nfs   ro   0 0' >> /tftpboot/installer/etc/fstab

Setting up TFTP

Modify the tftp lines in /etc/inetd.conf to look like the following:
tftp  dgram  udp   wait  root  /usr/libexec/tftpd  tftpd -l -s /tftpboot/installer
tftp  dgram  udp6  wait  root  /usr/libexec/tftpd  tftpd -l -s /tftpboot/installer
Finally, enable inetd and test your tftp server:
echo 'inetd_enable="YES"' >> /etc/rc.conf
service inetd start
tftp localhost
tftp> get boot/pxeboot
Received 231424 bytes during 0.0 seconds in 454 blocks

Boot!

That's it. You should now be able to boot the installer over the network, and install FreeNAS on a disk installed in your NAS server.  Don't forget to consult the FreeBSD handbook section on diskless booting if you need help troubleshooting anything.  After installing, you may need to alter the boot order again to ensure that your freshly installed OS is booted before PXE.

Good luck!


Wednesday, July 3, 2013

Using Subversion With the Kobold2D Game Engine and Xcode

I've been messing about with some basic MacOS and iOS game development lately, and at the moment I'm working with the Kobold2D game engine, which is (mostly) a refinement of cocos2d.  I've found however that in Kobold's quest to make initial setup of a project easier, it sidesteps some of the normal setup that Xcode does when you add a project or file.  Some of this, such as Project Summary info like the Application Category and Bundle Identifier is easily fixed after the fact.  Version control setup, on the other hand, is marginally more complicated than normal (at least with Subversion).

With a bit of trial and error I think I've got a working procedure to get a new Kobold project to play nicely with Subversion.  Here are my assumptions for these instructions; the more you deviate from these the less this will be relevant, and I'll leave variations as an exercise for the reader:
  1. You're running Xcode 4.6 (I'm testing with 4.6.3)
  2. You've got Kobold2D 2.1.0
  3. You already have a repository set up and waiting at version 0 (zero)
  4. We're creating a pong clone called  -- oddly enough -- "pong"
Any text here in fixed width is intended to be cut and pasted directly into your Terminal if you desire.  However, I won't be held responsible if anything goes awry... I'm trusting that you're using your head and paying attention before you run any of these commands.

Create a Kobold2D Project

Run the Kobold2d Project Starter app.  Select the appropriate template (I'm going with Physics-Box2D) and set the project name to 'pong'.  You can also set your own Workspace name here if you want.  Make sure you uncheck "Auto-Open Workspace" because we don't want to have that open quite yet.  Click on the "Create Project from Template" button.

Import the Project into Subversion

In Terminal, set ~/Kobold2D/Kobold2D-2.1.0 as your current directory
cd ~/Kobold2D/Kobold2D-2.1.0
Make a new directory structure with the usual 'trunk', 'branches', 'tags' directory structure in it
mkdir -p pong-import/{trunk,branches,tags}
 Move your new 'pong' project into the new trunk directory
mv pong pong-import/trunk/
Change directory into pong-import and import the project and directory structure into your repository
cd pong-import; svn import . https://svn.mydomain.com/pong/ -m "Initial import"
Now delete this directory structure
cd ..; rm -Rf pong-import
That's it for the Terminal.

Add The Repository to Xcode

This is the only step that's exactly as it would usually be.  Go to the Xcode Organizer (menu Window -> Organizer) and select the Repositories tab.   Click on the + in the bottom left corner of the window and select Add Repository.   Follow the prompts to name the repository, give it the URI to the repository, add your authentication credentials, etc..  For the purposes of the example, let's say the URI for your repository is "https://svn.mydomain.com/pong/".

Check Out a Working Copy

While still in the Xcode Organizer Repositories tab, click on the expander arrow to the left of your 'pong' repository.  It should show four folders:  'Root' in purple, and your 'Trunk', 'Branches' and 'Tags' directories in yellow.  Select 'Root' and then click on "Checkout" in the button bar across the bottom of the Organizer.

This will open a standard Save dialogue.  Browse your way to ~/Kobold2D/Kobold2D-2.1.0/, type 'pong' into the Save As field, and click on Checkout.

Clean Up Your Workspace

Return to your Kobold-2.1.0 folder in the Finder.  Open the "Kobold2D.xcworkspace" workspace, or your custom workspace if you created one.

You'll see your pong project listed, but it'll be in red.  That's because the files aren't where the automatically-created workspace expects to find them.   Right click on that and select Delete.

Then, right-click again and select Add Files to "Kobold2D" (or whatever the name of your workspace is).  Browse to ~/Kobold2D/Kobold2D-2.1.0/pong/trunk/pong, select 'pong.xcodeproj' and click on Add.

You're Done!

You should now have a functioning Kobold2D project with all of the usual Xcode internal Subversion support available.  You should be able to pick a random file from your 'pong' project files, right click it and go to Source Control -> Update Selected Files and cause Xcode to check if there are updates available for that file.  

Good luck, and good gaming.

Monday, February 15, 2010

Losing My Memories

Back at the beginning of January a horrible thing happened. It was something that a lot of people fear in this day and age, but which few really believe will happen to them. It happened to me though, and I had to find a way to recover from it. Yes, I lost all of my digital photographs.

The the complete details of how it happened are not terribly germane to the post but the short story is that, while moving to a new computer, for that brief period where a lot of this data existed only on my backup disk, a Windows installer decided it would like to reformat that backup disk for me.

Recovering data from a reformatted drive can be tricky. Without the original filesystem information you need some special tools to even find old files, let alone reassemble them into something recognizable. But, with a bit of work I managed to get all of my images back, and this is the story of how I did that.

The whole recovery story started off with a stroke of luck. I happened to mention the demise of all of my photographs to a friend of mine, and he just happened to know of an incredibly useful tool for recovering my data. He pointed me toward TestDisk by Christophe Grenier. TestDisk is rather badly named I think, because testing is the least of what it can do. One of the key features that made my life far, far easier is its ability to do file type recognition when recovering files.

When the filesystem information from a disk is lost, even if you're able to recover files, you can't always recover the file names; often that information is lost forever. That means that recovered files will typically wind up with some sort of coded file name (usually just a number generated by the recovery program). If you're recovering a very large disk, you can wind up with literally millions of files with completely nondescript file names. It would be completely impractical to try and sort through an entire disk worth of files that way trying to find the pictures.

Fortunately, TestDisk's ability to recognize file types based on the data in the file, rather than the file name, meant that I could tell it to only recover the JPEG images from the disk. This way I wound up with a set of files where I definitely knew the type of each and every file. And it just so happens that all of the digital cameras I've owned work in JPEG.

I knew I was still going to have a problem though. Because this was my backup disk, which contained not only my Aperture database, but also all of my Time Machine data (a MacOS backup tool), what would be recovered in searching for all JPEG images would include all of the pictures in my Aperture database, but also my entire web browser cache, and any other little jpeg images stored on my disk as part of various applications, etc. When the recovery ran, I ended up with a bunch of folders with a little under 35,000 pictures in them. Now what?

Well, the first thing I did was to try to eliminate any duplicate images. Even though that would be a fairly simple script to write, I always google for these sorts of tools before I try to write them myself. Usually, someone else has already written and posted the thing I need, and often it's better than I would have written on the first try. This was just such a case, and I found a great little perl script that would search for and remove all the duplicate images.

That got me down to a little over 20,000 images. Still a lot, but far fewer than I had before.

The next step was to try and separate out the original files downloaded from my cameras from all of the other random images. For that, I did write my own script. I scanned through all of the images to extract the original image date/time from the Exif data, reorganizing the images into directories by the day the picture was taken. If an image had no original date in its Exif data, or no Exif data at all, then I assumed the file was not a photograph (or not one of my photographs) and put it off in a separate directory to be sorted through manually later.

Here's the script I used:

#!/usr/bin/perl

use strict;
use diagnostics;
use warnings;

use Date::Parse;
use File::Find;
use Image::ExifTool qw(:Public);
use POSIX qw(strftime);

my( $source_d      ) = '/Users/matt/Desktop/Recovery/jpg/';
my( $base_dest_d   ) = '/Users/matt/Desktop/Recovery/jpg-sorted/';

my( $dir_date_format  ) = '%F';
my( $file_date_format ) = '%Y%m%d-%H%M%S';

my( $nodate_i, $nodate_d ) = (0, 00);

if( ! -d $base_dest_d )           { mkdir $base_dest_d; }
if( ! -d $base_dest_d.'NoDate/' ) { mkdir $base_dest_d.'NoDate/'; }

sub wanted {
    my( $source_file ) = $File::Find::name;
    my( $source_date, $dest_d, $target_f );
    
    unless( -f $source_file ) { return; }
    unless( $source_file =~ /\.jpg$/ ) { return; }
    
    
    my( $info ) = ImageInfo($source_file);
    if( $info->{DateTimeOriginal} ) {
        $source_date = str2time($info->{DateTimeOriginal});

        $dest_d = $base_dest_d .
            strftime($dir_date_format, localtime($source_date));

        $target_f = strftime($file_date_format, localtime($source_date));

        # in addition to naming the file by date, give the image an index
        # number that advances if there is more than one image with the same 
        # date+time
        my( $target_i ) = 0;
        while( length($target_i)<2 ) { $target_i = '0'.$target_i; }
        while( -f $dest_d.'/'.$target_f.'-'.$target_i ) {
            $target_i++;
            while( length($target_i)<2 ) { $target_i = '0'.$target_i; }
        }

        $target_f = $target_f.'-'.$target_i;

    } else {
        # images with no date/time get put into subdirs, 100 images per
        # directory to keep the directory from getting too large
        $nodate_i++;
        if( $nodate_i > 100 ) { 
            $nodate_i = 1;
            $nodate_d++;
        }
        while( length($nodate_d)<3 ) { $nodate_d = '0'.$nodate_d; }
        $dest_d = $base_dest_d . 'NoDate/' . $nodate_d;

        $target_f = $_;

    }

    if( ! -d $dest_d ) {
        mkdir $dest_d or die "failed to create dest dir $dest_d: $!";
    }

    my( $final_file ) = sprintf( "%s/%s", $dest_d, $target_f );
    printf "%s: %s: %s\n",
        $_, $info->{DateTimeOriginal} || 'NoDate', $final_file;

    link( $_, $final_file ) or die "failed to link files $_:$final_file: $!";
}

find(\&wanted, $source_d );

This has left me with about 7,300 images sorted out into directories by the date the picture was taken, and about 13,600 in directories of images with no known shoot date.  This is far more manageable!  I'll probably still wind up doing a bunch of manual sorting of the images that are left, but now the task is much more approachable than it was in the beginning.   It's also possible I could find some other useful piece of Exif data to sort them by.

Sunday, February 22, 2009

Load Balancing DNS Using Cisco's IP SLA Feature

It's generally accepted that using any sort of stateful load-balancer in front of a set of DNS servers is a bad idea. There are several reasons for this, but my favourites are that:
  1. it means adding an unnecessary potential point of failure
  2. the state engines in load-balancers aren't scaled for DNS, and will be the first thing to fail under heavy load or a DoS attack
The issue with the state engine in a load-balancer is that it is scaled for average Internet traffic, which DNS is not. Each DNS connection (each flow) is typically one UDP packet each way, and well under 1KB of data. By contrast, with most other common protocols (smtp, http, etc.) each flow is hundreds to millions of packets, and probably tens, hundreds or even millions of KB of data. The key metric here is the flows:bandwidth ratio. Devices are built so that when they reach their maximum bandwidth capability, there's room in memory to track all of the flows. The problem is, they're typically scaled for average traffic. Since the flows:bandwidth ratio for DNS is so very much higher than other types of traffic, you can expect that a load-balancer in front of busy DNS servers will exhaust their memory in trying to track all the flows long before the maximum bandwidth of the device is reached. To put it another way, by putting a load-balancer scaled for 1Gb of traffic in front of DNS servers scaled for the same amount of traffic, you actually drastically reduce the amount of DNS traffic those servers can handle.

There are better ways.

ISC, the maker of BIND, has an excellent technote which describes using OSPF Equal Cost Multi-Path (ECMP) routing to distribute load between a set of DNS servers. In effect, it's a scheme for doing anycast on a LAN scale, rather than WAN. Put simply, it involves using Quagga or some other software routing daemon on each DNS server to announce a route to the DNS service address. A wrapper script around the DNS process adds a route just before the process starts, and removes it just after the process exits. The approach works quite well as long as the local router can handle OSPF ECMP, and as long as it uses a route hashing algorithm to maintain a consistent route choice for each source address without needing a state table. For example, the Cisco Express Forwarding (CEF) algorithm uses a hash of source address, destination address, and number of available routes to produce a route selection.

The down sides to the ISC method are that there's a small amount of complexity added to the management of the DNS server itself (for example, you can no longer use the standard application start/stop mechanisms of your OS for the DNS software) and the risk that a failure may occur which causes the DNS software to stop answering queries, but not exit. If the latter occurs, the route to that server will not be removed. This is pretty safe with BIND, as its designed to exit on any critical error, however that's not necessarily the case with all DNS server applications.

There's another method available (that I'm going to describe here) which, while being very similar to the ISC methodology, does not have these particular flaws. I should point out here that the method I'm about to describe is not my invention. It was pieced together from the ISC technote and some suggestions that came from Tony Kapella while chatting about this stuff in the hallway at a NANOG meeting a while back. After confirming how easy it is to get this method to work I've been singing its praises to anyone who will listen.

At a high level it's quite similar to the OSPF method. The DNS service address is bound to a clone of the loopback interface on each server, and ECMP routing is used, but rather than populating the routes with OSPF and running routing protocols on the DNS servers, route management is done with static routes on the local router linked to service checks which verify the functionality of the DNS service.

Setting It All Up

In this example, we'll use the RFC 3330 TEST-NET. The service address for the DNS service will be 192.0.2.253. This is the address that would be associated with a name server in a delegation for authoritative DNS service, or would be listed as the local recursive DNS server in a DHCP configuration or desktop network config. The network between the local router and the DNS servers will be numbered out of 192.0.2.0/28 (or 192.0.2.0 through 192.0.2.15). The server-facing side of the router will be 192.0.2.1, and that will be the default route for each of the DNS servers, which will be 192.0.2.10, 192.0.2.11 and 192.0.2.12. This network will be the administrative interfaces for the DNS servers.

Once the servers are reachable via their administrative addresses, make a clone of the loopback interface on all three servers. Configure the second loopback interface with the DNS service address.

On FreeBSD, the rc.conf entries for the network should look something like this:
defaultrouter="192.0.2.1"
cloned_interfaces="lo1"
ifconfig_em0="192.0.2.10 netmask 255.255.255.240"
ifconfig_lo1="192.0.2.253 netmask 255.255.255.255"
It's a little more difficult to represent the configuration under Linux since it's spread across several config files, but the above should give you a pretty good idea of where to start.

Once the network setup is finished, configure your DNS server software to listen to both the administrative address and the service address. So, on the first DNS server, it should listen to 192.0.2.10 and 192.0.2.253.

That's all that needs to be done on the servers. Note that doing this was far simpler than configuring the servers to run OSPF and automatically add and remove routes as the DNS service is started or stopped.

The last few steps need to be taken on the local router. The first of these is to configure the router to check up on the DNS service on each of the three servers and make sure it's running; this is where Cisco's IP SLA feature comes into play. Configure three service monitors, and then set up three "tracks" which will provide the link to the service monitors.
ip sla monitor 1
type dns target-addr www.example.ca name-server 192.0.2.10
timeout 500
frequency 1
ip sla monitor schedule 1 life forever start-time now
!
ip sla monitor 2
type dns target-addr www.example.ca name-server 192.0.2.11
timeout 500
frequency 1
ip sla monitor schedule 2 life forever start-time now
!
ip sla monitor 3
type dns target-addr www.example.ca name-server 192.0.2.12
timeout 500
frequency 1
ip sla monitor schedule 3 life forever start-time now
!
track 1 rtr 1
track 2 rtr 2
track 3 rtr 3
This sets up three IP SLA Monitors which repeatedly query the administrative address on each server for the A record www.example.ca. The DNS server must respond with an A record for the QNAME you use; if it is unable to respond, or responds with a different record type, the monitor fails. In the example above the monitor attempts the lookup every second (frequency) and fails if it doesn't receive a valid A record within 500ms (timeout). You may need to experiment with the timeout value, depending on how responsive your DNS servers are. If you find individual servers appear to be going out of service when the daemon is still operating fine you might have the timeout value set too low.

With the monitors in place, turn on CEF and then configure three static routes to the service address via each server's administrative address. The routes are linked to the service monitors using the track argument:
ip cef
!
ip route 192.0.2.253 255.255.255.255 192.0.2.10 track 1
ip route 192.0.2.253 255.255.255.255 192.0.2.11 track 2
ip route 192.0.2.253 255.255.255.255 192.0.2.12 track 3
And that should be it. DNS queries arriving at the external interface of the router bound for 192.0.2.253 should now be routed to one of the DNS servers behind it, with a fairly equal load distribution. Since the router is using a hashing algorithm to select routes the load distribution can't be perfect, but in practise I've found that it's incredibly even. The only likely reason to see an imbalance is if your DNS servers receive an unusually high percentage of their queries from just one or two source addresses.

It's important to point out that most of the cautions in the ISC technote, particularly in reference to zone transfers and TCP DNS, apply equally here. I highly recommend reviewing the ISC document before implementing this in production.

Of course, there is still one big downside to this particular method of load balancing: it's depedant on one particular vendor. I have not yet found a way to reproduce this configuration using non-Cisco routers. If anyone is aware of a similar feature available from other major routing vendors please let me know and I'll integrate instructions for those routers here.