Showing posts with label internet. Show all posts
Showing posts with label internet. Show all posts

Wednesday, March 13, 2019

Using LTE for Out of Band

It's generally good practice to make sure that any network you're responsible for maintaining can be reached in the event of a failure of either the main Internet (transit) connection, or failure (or misconfiguration) of the routing equipment. Sometimes it's not feasible to have a second transit connection or redundant networking hardware, and so you need to get creative.

One of my clients is a not-for-profit with constrained financial resources. We wanted to have a way in to the network in the event of a failure of the main router, or in case someone (likely me) fat-fingers something and breaks the router config. And, while having a second transit connection would be nice, it's just not something we can fit in the budget at the moment.

So, we had to get creative.

Before I came on board, they had purchased a consumer-grade LTE modem with the intention of using that as the backup access into the network, but hadn't actually set it up yet. This blog post covers the steps I took to get it working.

Overview

The data centre in question is in the US, so we're using a simple T-mobile pay-as-you-go data service. This service is designed for outgoing connections, and doesn't provide a publicly-reachable IP address that I could ssh to from outside the LTE network, so I need to set up some sort of tunnel to give me an endpoint on the Internet I can connect to that gets leads inside the client's network. ssh itself is the obvious choice to set up that tunnel.

I've set the tunnel up to provide access to one of the client's administrative hosts, which has serial access to about half the network equipment (including the main router). From that vantage point I should be able to fix most configuration issues that would prevent me from accessing the network through the normal transit connection, and can troubleshoot upstream transit problems as if I were standing there in the data centre.

The modem can be put into bridge mode, but can still have an IP address to manage its configuration. The LTE network wants to use DHCP to give our server an address. So, we'll have the slightly unusual configuration of having both a static and DHCP address on the server interface that the modem is connected to. The server has other duties though, so we'll have to make sure that things like the default route and DNS configuration aren't overwritten; that requires some extra changes to the DHCP client config.

And finally, for the tunnel to work we need a host somewhere out on the Internet that we can still reach when the 'home' network goes down. In the rest of this post I'm going to refer to our local administrative host as HOST_A and the remote host we're using for a tunnel endpoint as HOST_B. We'll need some static routes on HOST_A that send all traffic for HOST_B through the LTE network, and then we can construct the ssh tunnel which we'll use to proxy back into HOST_A.

Setting up the Modem

The modem we're using is a Netgear LB2120 LTE Modem with an external antenna, to get around any potential interference from the cabinet itself, or the computer equipment and wiring inside. We have pretty good reception (4-5 bars) from just placing the antenna on top of the cabinet.

The modem's LAN port is connected directly to an ethernet port on HOST_A. We could also have run that connection through a VLAN on our switches, but since the router and the server are in the same cabinet that would only serve to increase the possible ways this could fail, while providing no benefit. The main point here is that the router is going to provide its own network, so it's best not to have it on the same physical network (or VLAN) with other traffic.

This modem is designed to be able to take over in the event of the failure of a terrestrial network, which is what the WAN port is used for. But we don't want to use that here, so that port is left empty.

Connect to the modem's web interface (for this model, the default IP address and password are printed on the back).

In the Settings:Mobile tab, take a look a the APN details. This probably defaults to IPv4 only, so if you want to try to get IPv6 working (more on that later) you'll have to update the PDP and PDP Roaming configuration here. In the Advanced tab, you want to put the modem into Bridge mode (which will also disable the DHCP server), and you may want to give it a different static address. The modem's default network overlaps with private address space we already use, so I'm going to use 172.16.0.0/30 as an example point-to-point network to communicate with the modem. For that, you'd set the modem's IP address to 172.16.0.1 and its netmask to 255.255.255.252. Once you submit the configuration changes, the modem should restart.

Setting up the Server

The server needs to have a static IP address on the point-to-point network for configuring the modem as well as a DHCP address assigned by the LTE network. Because we may want to bring these up and down separately, I suggest putting the DHCP address on a virtual interface. You also need to configure a static route on the DHCP-assigned interface that points to HOST_B, so that any outbound traffic from HOST_A to HOST_B goes across the LTE network instead of using your normal Internet links. On a Debian host, /etc/network/interfaces.d/LTE.conf might look something like this:

auto eth3
iface eth3 inet static
 address 172.16.0.2/30

auto eth3:0
iface eth3:0 inet dhcp
 post-up ip route add 192.0.2.1/32 dev eth3:0
 post-down ip route del 192.0.2.1/32 dev eth3:0

You'll also need to modify /etc/dhcp/dhclient.conf to disable some of the changes that it normally makes to the system. The default request sent by the Debian dhclient includes the following options:

request subnet-mask, broadcast-address, time-offset, routers,
   domain-name, domain-name-servers, domain-search, host-name,
   dhcp6.name-servers, dhcp6.domain-search, dhcp6.fqdn, dhcp6.sntp-servers,
   netbios-name-servers, netbios-scope, interface-mtu,
   rfc3442-classless-static-routes, ntp-servers;

I've modified ours to remove the routers, domain-name, domain-name-servers, domain-search, dhcp6.name-servers, dhcp6.domain-search, dhcp6.fqdn, and dhcp6.sntp-servers options. You also need to block changes to /etc/resolv.conf. Even though you've told dhclient not to request those options, the server may still supply them and dhclient will happily apply them unless you explicitly tell it not to.

request subnet-mask, broadcast-address, time-offset, host-name,
    netbios-name-servers, netbios-scope, interface-mtu,
    rfc3442-classless-static-routes;

supersede domain-name "example.net";
supersede domain-name-servers 198.51.100.1, 198.51.100.2;

Setting up the Tunnel

For this, you want to create an unprivileged user that doesn't have access to anything sensitive. For the purposes of this post I'll call the user 'workhorse'. Set up the workhorse user on both hosts; generate an SSH key without a passphrase for that user on HOST_A, and put the public half in the workhorse user's authorized_keys file on HOST_B.

We're going to use SSH to set up the tunnel, but we need something to maintain the tunnel in the event it drops for some reason. There is a handy programme called autossh which does the job well. In addition to setting up the tunnel we need for access to HOST_A, it will also set up an additional tunnel that it uses to echo data back and forth between HOST_A and HOST_B to monitor its own connectivity, and restart the tunnel if necessary. We can combine that monitor with SSH's own ServerAliveInterval and ServerAliveCountMax settings to be pretty sure that the tunnel will be up unless there's a serious problem with the LTE network or modem.

I've chosen to run autossh from cron on every reboot, so I created an /etc/cron.d/ssh-tunnel file on HOST_A that looks like this:

@reboot workhorse autossh -f -M 20000 -qN4 -o "ServerAliveInterval 60" -o "ServerAliveCountMax 3" -R '*:20022:localhost:22' HOST_B

The -f option backgrounds autossh-M 20000 sets up a listening port at HOST_B:20000 which sends data back to HOST_A:20001 for autossh to use to monitor the connection. You can explicitly specify the HOST_A port as well, if you prefer. The remaining options are standard ssh options which autossh passes on. Note that in my case HOST_B has an IPv6 address, but I haven't configured the tunnel interface for IPv6, so I'm forcing ssh to use IPv4.

You may need to modify the sshd_config on HOST_B to set GatewayPorts yes, depending on the default configuration. Otherwise you won't get a remotely accessible port on HOST_B.

Instead of using cron, you could also use something like supervisord or systemd to start (and re-start if necessary) the autossh process.

Using the Setup

Once this is all put together, you should be able to ssh to port 20022 on HOST_B, and wind up with a shell on HOST_A.

matt@chani.conundrum.com:~
16:05:58 (3130) % ssh -p 20022 HOST_B
The authenticity of host '[HOST_B]:20022 ([192.0.2.1]:20022)' can't be established.
ECDSA key fingerprint is SHA256:4v+NbLg2QYqe43WFR9QKXaVwCpcc71u5jJmxJdZVITQ.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[HOST_B]:20022,[192.0.2.1]:20022' (ECDSA) to the list of known hosts.
Linux HOST_A 4.9.0-6-amd64 x86_64 GNU/Linux

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed Mar 13 18:02:09 2019 from 216.235.10.40

matt@HOST_A:~
20:05:59 (618) %

Why no IPv6?

T-Mobile support IPv6 on their LTE networks, so I have the APN for our modem set to IPV4V6 PDP. The server configuration has been a problem, however.

As with IPv4, we don't want to get a default route for our LTE network because that would interfere with the normal traffic of the server. It seems like disabling the acceptance of Router Advertisement (RA) messages should be all that's necessary, but for some reason that entirely disables SLAAC address assignment.

% cat /etc/network/interfaces.d/LTE.conf

auto eth3
iface eth3 inet static
 address 172.16.0.2/30

auto eth3:0
iface eth3:0 inet dhcp
 post-up ip route add 192.0.2.1/32 dev eth3:0
 post-down ip route del 192.0.2.1/32 dev eth3:0

iface eth3:0 inet6 auto
 pre-up /sbin/sysctl -w net.ipv6.conf.eth3.accept_ra=0
 post-up ip route add 2001:db8::1//128 dev eth3:0
 post-down ip route del 2001:db8::1/128 dev eth3:0

I have also tried using DHCPv6 (iface eth3:0 inet6 dhcp, above) but that also fails to get the configuration I want, and also causes ifup to return a fail condition when configuring the interface. At least the above SLAAC problem has the feature of failing silently, so I can leave the configuration in place without causing problems with interface management.

Perhaps you can find the right combination of options to make it work!  I invite you to follow up, if you do.

Good luck!

Monday, July 10, 2017

The .io Error: A Problem With Bad Optics, But Little Substance

EDIT:  There are several comments on this post (and sent to me privately) which correctly note that I've overlooked an important variation in the behaviour of recursive servers, that would affect the ability of this hijack to succeed.  I'm leaving the post up as-is because I think it demonstrates just how complicated the DNS is, and just how easy it is for anyone (even someone who knows it inside and out) to miss something important.

Original article below the jump...


Friday, October 11, 2013

Society's Bullies Hide Behind Secrecy

This week I had the privilege of being present at a discussion with Ladar Levison at a meeting of the North American Network Operators' Group (NANOG), his first public appearance since the court documents related to his fight with the FBI were made public.

For those not familiar with the case, Levison is the owner of Lavabit, a web-based email service designed to be secure against eavesdropping, even by himself. On August 8th this year he suddenly closed the service, posting an oblique message on the front page of the Lavabit website. The message explained only that he had closed the service because he had been left with a choice to "become complicit in crimes against the American people or walk away from nearly ten years of hard work by shutting down Lavabit."

There has been much speculation over the last couple of months that he had closed the service over a subpoena related to Edward Snowden's use of the service, and that an attached gag order similar to a National Security Letter (which were found to be unconstitutional in 2004) prevented him from speaking out about it.

Much of that speculation was confirmed last week when the courts unsealed the documents relating to Levison's appeal of a July 16th court order, which required him to turn over cryptographic keys that would allow the FBI to spy on all of the service's traffic, not just the information specific to Snowden's use of the service, which was specified in the original warrant. Wired Magazine published an article last week with most of the known details of the case, so I won't go into much more detail about that.

What I'd like to highlight is the danger to information security, consumer confidence, and the technological economy as a whole, should Levison lose his fight with the FBI.  The keys being requested by the FBI would allow them access not only to all of the information related to the individual targeted by their warrant, but also every other customer's data, and the data of the business itself.   This is highly reminiscent of recent revelations regarding the NSA and the scope of their data collection.  If that sort of wide net is supported by the courts, the fight for any kind of personal privacy will be lost, and consumers will never be able to trust any company with ties to the United States with any data at all.

This isn't just a problem in the United States.  Many of our online services eventually have dependencies on US companies.  In Canada, a huge percentage of our network traffic crosses into the US in order to cross the continent rather than remaining in Canada when moving between cities.  In other countries consumers rely on some of the more obvious US-based services (Facebook, Twitter, Google) but also many other services have less obvious dependencies, such as with services hosted by US-based data centres or on so-called "cloud" services with ties to the US.

As Andrew Sullivan comments during the Q&A, overreaching orders such as these are an attack on the network, as surely as any criminal trying to break a network's security.  Our personal privacy and the security technologies that guarantee it face attacks by those who want easy access to everyone's information, under the pretence of protecting the very people whose privacy is being violated.  It is vitally important that other business owners, like Levison, step up and fight orders such as these, so that a real public debate can happen over whether we still feel personal privacy and personal freedoms still trump the government's desire to have it easy.

At this point it is impossible to know whether any similar services have been compromised in the way the FBI has attempted with Lavabit.  I applaud the principled stance Levison is taking against this intrusion, and hope that, should I ever be in a similar position, I would have the strength to endure the long fight necessary to see it through.


Tuesday, February 9, 2010

So Easy, an Adult Can Understand!

The Internet, explained to your mom.

That's how I'd describe this video from EuroIX, an association of European Internet Exchange Points (IXPs). Don't worry if you don't know what an IXP is.. the video explains.

I was all set to blog about this months ago, but then they took the video down in order to update and clarify a few things. It's back up now, and so you get to enjoy its informational goodness.

Tuesday, September 29, 2009

Naïveté and How It Can Break Your Internet

As of the end of last week, I was all set to put together a new post tonight about the latest load of BS from Nominum, but now that will have to wait a few days. There's something much more pressing (and probably relevant) to the five or six people reading this on the day that I post it that I think I should talk about. This morning, the new issue of a weekly podcast that I listen to was posted.

Jesse Brown is the host of Search Engine, a weekly technology news podcast that I have been listening to for about six months, every week, no exceptions (well... actually... excepting weeks when there's no show). Jesse – I hope he doesn't mind if I call him Jesse – seems to me to concentrate on the social aspects and effects of technology (and he may have even stated this himself, on his show), and by and large I think he does it well. This week's issue, titled Are You Gay? (The Internet Wants To Know), was posted this morning, and the second interview (starting at about seven and a half minutes in) was about two subjects I know very well: the DNS, and CIRA. Unfortunately, what I heard on Jesse's show this morning was both shocking and disappointing. And I told him so. In retrospect I was perhaps overly harsh in some of my criticism, but only insofar as twitter is a terrible medium for conveying nuance or detail.

My problem with the interview is that both interviewer and interviewee were wholly unprepared. I'll get to the interviewee in a moment, because what he had to say is my main source of outrage. For the next paragraph or two I'll quickly sum up what I thought was wrong with the way Jesse approached the whole thing, and then get on with the meat of the matter.

The interviewee in this case clearly came to the table with an agenda, and its this agenda that the whole interview was really about: the desire to make sweeping changes to the status quo. Jesse not only did not challenge the position that change is needed, but Jesse admits that he arrived at the interview without knowing anything about the subject at hand. He did no background research and didn't even look for dissenting opinions. This approach is full of fail. If the interviewer doesn't challenge the agenda, and isn't informed enough for critical analysis of the interviewee's answers then it becomes the interviewer's equivalent of publishing a press release as a news item. Of course the person with the agenda is going to present the facts in such a way as to support their argument, and not a balanced view. Worse in this case, the "facts" weren't even really facts for the most part. One would like to believe that a director of an organization would be able to coherently discuss what that organization does, and why it does it that way, but one might be wrong. This whole thing will cause me to reassess Search Engine as a source of information. When covering those subjects that I really know nothing about, will I be able to trust that the expert on the air is really an expert? Certainly not as I have trusted in the past.

I first became aware that Jesse was working on something like this interview last Saturday when he asked a question on twitter that was right up my alley (I missed an earlier, more direct reference). I responded there and in email, offering to help fill in the blanks. I also happen to know that several other people made similar offers. Jesse didn't take me up on my offer, or anyone else's that I'm aware of, and unfortunately it's clear now the reason is that the interview had already been completed at that point, and was simply waiting to be aired. That is far too late to look for supporting information.

So what was this whole thing about?

Barry Shell would like you to elect him to the board of directors for the Canadian Internet Registration Authority (CIRA), the organization that manages the .ca Internet domain. This in itself is not news. What's newsworthy is why he would like you to elect him. He claims that CIRA is too big and expensive and that the organization should be run more like free online services like craigslist.org, or perhaps like the .ca domain was run back in the late 90's. The problem is that Barry's views are hopelessly naïve, are based on a simplistic understanding of what CIRA actually does and, if implemented, would not only threaten the stability of your Internet (well... the two or three Canadians reading this) but would also threaten the stability of your economy, and possibly even your life.

A bold claim, I know. I plan to back it up. But first, some background on me and where I'm coming from so that you can judge my agenda.

As anyone who has read the sidebar will know, until fairly recently I was the DNS Operations Manager for CIRA. I no longer work there, and I am not a part of, nor a candidate for the board of directors being elected right now. I am still a DNS specialist, and I work for a different (some would say competing) domain registry. So my only association with CIRA at this point is that I am a .ca domain-holder concerned about how the organization is run because it directly impacts the Internet that I use every day. Really, the only difference between me and most of the people likely reading this is that I happen to possess some very detailed information about how CIRA is run currently, about the environment in which it operates, and about the possible side effects of changing either of those things. My agenda is to try to share as much of that information as I can, and hopefully to convince you that the way in which CIRA is run is far more important than Barry Shell would have you believe.

Before I get into the finances, which believe me will be brief, I think it's important to correct the collection of misinformation, bad assumptions, and vague statements that permeated the interview, and which would lead the uninformed to incorrect assumptions.

To begin with, CIRA is not "a server." CIRA isn't even an organization that just runs "a server." CIRA is what is known as a domain name registry, but what is that? To explain, I'll step back a bit from CIRA and start with two other related groups.

First, there's the people who register .ca domains: the registrants. These registrants go to a web hosting company, or their ISP, or some other company to pay to register a new domain. This company, known as a registrar, will charge anywhere from $10 to around $50 to take care of the process, possibly also setting up some email or a web site or some other service to go along with the newly registered domain. In the background, this registrar submits the newly registered domain to CIRA, and pays them $8.50. What does CIRA do for that $8.50? Well, there are two core services that CIRA provides.

As a domain registry, CIRA is responsible for ensuring uniqueness. Just like the land registry, CIRA ensures that no two people or organizations think they've registered the right to use the same space; the difference is that CIRA deals in domains rather than plots of land. The second core service that CIRA provides is to include that registration in a global directory known as the Domain Name System, the DNS. Note that is Domain Name System, not Domain Name Server, as Barry says. The key here is that the DNS is a large interconnected database made up of at least hundreds of thousands, more likely millions of servers. The directory is structured like a tree, with the root branching out to the top level domains, or TLDs, like .ca, .com, .net, .org, .info, and others. The TLDs branch out to registrants' domains like cbc.ca or craigslist.org.. and so on.

What this directory system does is convert those memorable host names you type into a web browser (like www.tvo.ca) or into your mail client as part of an email address, into numeric addresses and other information that the computers of the Internet actually use to talk to each other. This is no simple task, but Ben Lucier has a great little layman's explanation of how part of it works.

CIRA's position in this directory is at the top of the .ca branch of this tree. It is responsible for making sure that any computer that looks up a .ca domain gets to the right place. Due to some shortcuts built into this system, the DNS servers at the top of the tree only see a tiny fraction of the total lookups that occur, but even that tiny fraction means that the servers responsible for the .ca domain answer about 13,500 of these lookups every second.

Every. Second.

Now, that's actually pretty easy work for a bunch of DNS servers, but the statistic starts to underline the importance of CIRA's position in making sure that all of those .ca domains continue to function. And that number doubles approximately every 18 months. When you take into account that most Internet businesses keep equipment in service for three to five years, that means that equipment CIRA is putting into service today to handle 13,500 DNS lookups every second must be able to handle over 100,000 per second by the time it is replaced. It's important that this DNS service that CIRA provides never be unavailable, or those lookups go unanswered.

But what happens if CIRA's DNS servers are unable to answer those queries for a few seconds... or a few minutes, or hours, or days? Does it really matter that much?

Perhaps not, if you're just talking about someone's personal web site, as Barry seems to mostly be concerned with, or a blog, or the place you download a weekly podcast. And in 1998, before CIRA existed, when the .ca domain was run by a bunch of volunteers led by John Demco (not just the two or three people Barry says it was), perhaps it wouldn't have been important if these sites failed to work for some period of time. But of course, we don't live in 1997 anymore. And the model of running a domain registry with a handful of volunteers and some donated servers was replaced with CIRA precisely because the old way of doing things was no longer working.

Today these uses of the Internet are all important, and they're part of what has made the Internet such a fundamental part of our daily lives. But, one must also remember that because the Internet has become a fundamental part of our daily lives, it has also become a major engine in the world economy. According to a study commissioned by the Interactive Advertising Bureau, the Internet is responsible for $300 billion in economic activity in the US every year. I'm certainly no economist, but if one were to very simplistically scale that back to the size of the Canadian economy, that would mean the Internet injects $27 billion into our economy every year. That's not chump change. If the Internet is unreliable, what happens to that money?

But let's move beyond corporate uses of the 'net and the economy. What's this poppycock about a broken Internet threatening my life?

Well, the Internet is now a part of daily business. What most people forget, is that doing business online doesn't just mean shopping for gifts, or playing online games. People forget that most organizations now use the Internet for internal communication. Organizations like online stores and social media, sure, but also organizations like our governments, critical infrastructure like our water, power and gas distribution... and our emergency services.

You may hear claims from your ISP that they guarantee "five nines" of availability. It's a fairly common service guarantee on the Internet, and it means that they are up and running 99.999% of the time. Put another way, it means that they permit themselves about five minutes of down time per year. Domain registries like CIRA don't do "five nines". They can't afford five minutes of outage every year. The DNS at that level must be a 100% uptime proposition, or Bad Things happen.

When this is taken into account, a pretty high level of redundancy to ensure availability seems warranted. Not only does CIRA need to ensure that there's enough capacity to handle all of the DNS queries their servers receive, without the servers becoming overloaded, they must also ensure that servers can be taken offline for regular maintenance, and that unexpected failures like power loss, crashed computers, network failures at an ISP, or other breakages don't take down the whole system. And that doesn't even address the threat of deliberate vandalism.

You may have heard of a style of attack against Internet services known as "denial of service", or DoS. One form of this type of attack involves sending extremely large volumes of requests to a service in order to tie it up, and reduce the resources it has available for legitimate requests. It's becoming increasingly popular to direct these attacks at DNS services. Today, these attacks are carried out using what's known as a "botnet" which is tens of thousands to hundreds of thousands of computers on the Internet which have been taken over to be used for often illegal purposes. Remember how I mentioned that CIRA's servers have to be ready to handle 13,500 queries per second today, and over 100,000 in five years? Well, as it turns out, it's quite simple for a small botnet to dwarf those numbers.

If CIRA simply built to the expected normal load, and added a bit to handle broken servers, they would still be vulnerable to being taken out by a bored high school student. This is nothing compared to the resources available to organized crime, or other nation states. And if you think nations attacking each other over the Internet sounds like a bad spy flick then get out your popcorn, because it's already happening. In order to prepare for these potential attacks, some registries build out their DNS infrastructure to support well over 100 times the expected load.

Given all of this, the money Barry Shell seems to think that CIRA is wasting seems pretty well spent, to me. And I've really only scratched the surface of one part of CIRA's budget, which is available online as part of the annual report, by the way. The side of the registry which is responsible for actually taking registrations may not be quite as essential a service as the DNS is, but many Canadian businesses, the registrars I mentioned earlier, depend on that registry being available to take registrations or they start to lose money, so those systems need to be well built to a different level of tolerance to failure or attack. Then there's CIRA's customer service department, programmers to write the software, systems people like me to make it run, the back-office functions, required by any business, like finance and administration staff... it adds up pretty quickly. CIRA actually operates pretty modestly compared to most domain registries.

Now getting to those finances..

It's been suggested that CIRA should reduce the wholesale cost of registering a domain from $8.50 per year to something smaller. However, it's been demonstrated in past reductions in price that wholesale price reductions don't get passed on to the general public as you might expect. $10 or $15 a year to register a domain isn't really an onerous sum for the average Internet user who wants their own domain, and a few cents to a dollar reduction in that cost really doesn't benefit the average Canadian all that much. The only people it does benefit are organizations that buy domains in very large numbers, like the domain registrars that sell to the general public, and another class of Internet user known as a "domainer". Domainers are those people who own literally thousands to millions of domains each, and frequently use them to put up those web pages that are nothing but advertizing, hoping that when you mistype amazon.com and accidentally go to azamon.com that you'll click on one of their ads and make them a few cents. CIRA takes its stewardship of the .ca domain – a national public resource – very seriously, and has no interest in supporting the interests of domainers over the interests of average Canadians who may want to register those domains that are just being used for ads.

It's true that most years CIRA operates with a budget surplus, in recent years as much as $2M. Where does that money go? Not-for-profit organizations are required by law to not have a profit. It's in the name. If a not-for-profit organization does find itself actually making profit, then the Canada Revenue Agency steps in for its cut. There are some pretty specific rules about when a not-for-profit is permitted to have a surplus, and what it can do with those surplus funds. What CIRA has done with its surpluses so far is to pay off a debt owed to UBC in exchange for all of the years UBC volunteers managed the service before CIRA existed, and to pay into a fund which is meant to support CIRA through any lean or financially disastrous years that may be yet to come. This is standard operating procedure for many companies, and is an especially important layer of insurance for an organization that operates a piece of critical national infrastructure.

Barry suggests that instead of these things, CIRA should be supporting research and other concerns of benefit to Canadians' use of the Internet, as if this is his own idea. In actual fact, CIRA staff have been lobbying the board to do just that for several years, and in the last year or two CIRA has already engaged in operational support and direct funding for several programmes, to the extent that it has been able to do that without stressing its rainy day fund or regular budget.

For having been on CIRA's board for a year, Barry shows pretty intense ignorance of CIRA's business and the environment it operates in. It's one thing for a new candidate, without any prior experience on a board, or without experience in the domain sector of the Internet industry to arrive fresh-faced with misconceptions about what CIRA does. It is essential for anyone who wishes to serve on a board of directors to inform him or herself to the best of their ability about the business they're operating, and the industry in which it operates. For someone who has been doing the job for a year to be so uninformed as Barry Shell requires almost willful ignorance. It's actually a shame that this interview aired when it did, because the election in which Barry is running ends tomorrow at noon, and I'd like everyone voting to listen to it; this week's Search Engine the best argument to not vote for Barry Shell that there is.

Tuesday, June 2, 2009

On Securing the DNS

I don't plan to make a habit of talking about things specific to my job here... in fact, I will almost never do that. It's just easier than always having to disclaim any relationship to the views of my employer, and so forth. However, today I can't help but toot our own horn.

A few hours ago, Public Internet Registry (PIR – the manager of the .ORG Internet domain name) announced that the .ORG zone has been secured with DNSSEC, the DNS Security Extensions. This makes ORG the largest Top Level Domain that has been signed to date, and the only open registry to implement DNSSEC (open in the sense that all of the other signed TLDs are at registries which have restricted registration policies: six national TLDs, and .GOV).

Sunday, February 22, 2009

Load Balancing DNS Using Cisco's IP SLA Feature

It's generally accepted that using any sort of stateful load-balancer in front of a set of DNS servers is a bad idea. There are several reasons for this, but my favourites are that:
  1. it means adding an unnecessary potential point of failure
  2. the state engines in load-balancers aren't scaled for DNS, and will be the first thing to fail under heavy load or a DoS attack
The issue with the state engine in a load-balancer is that it is scaled for average Internet traffic, which DNS is not. Each DNS connection (each flow) is typically one UDP packet each way, and well under 1KB of data. By contrast, with most other common protocols (smtp, http, etc.) each flow is hundreds to millions of packets, and probably tens, hundreds or even millions of KB of data. The key metric here is the flows:bandwidth ratio. Devices are built so that when they reach their maximum bandwidth capability, there's room in memory to track all of the flows. The problem is, they're typically scaled for average traffic. Since the flows:bandwidth ratio for DNS is so very much higher than other types of traffic, you can expect that a load-balancer in front of busy DNS servers will exhaust their memory in trying to track all the flows long before the maximum bandwidth of the device is reached. To put it another way, by putting a load-balancer scaled for 1Gb of traffic in front of DNS servers scaled for the same amount of traffic, you actually drastically reduce the amount of DNS traffic those servers can handle.

There are better ways.

ISC, the maker of BIND, has an excellent technote which describes using OSPF Equal Cost Multi-Path (ECMP) routing to distribute load between a set of DNS servers. In effect, it's a scheme for doing anycast on a LAN scale, rather than WAN. Put simply, it involves using Quagga or some other software routing daemon on each DNS server to announce a route to the DNS service address. A wrapper script around the DNS process adds a route just before the process starts, and removes it just after the process exits. The approach works quite well as long as the local router can handle OSPF ECMP, and as long as it uses a route hashing algorithm to maintain a consistent route choice for each source address without needing a state table. For example, the Cisco Express Forwarding (CEF) algorithm uses a hash of source address, destination address, and number of available routes to produce a route selection.

The down sides to the ISC method are that there's a small amount of complexity added to the management of the DNS server itself (for example, you can no longer use the standard application start/stop mechanisms of your OS for the DNS software) and the risk that a failure may occur which causes the DNS software to stop answering queries, but not exit. If the latter occurs, the route to that server will not be removed. This is pretty safe with BIND, as its designed to exit on any critical error, however that's not necessarily the case with all DNS server applications.

There's another method available (that I'm going to describe here) which, while being very similar to the ISC methodology, does not have these particular flaws. I should point out here that the method I'm about to describe is not my invention. It was pieced together from the ISC technote and some suggestions that came from Tony Kapella while chatting about this stuff in the hallway at a NANOG meeting a while back. After confirming how easy it is to get this method to work I've been singing its praises to anyone who will listen.

At a high level it's quite similar to the OSPF method. The DNS service address is bound to a clone of the loopback interface on each server, and ECMP routing is used, but rather than populating the routes with OSPF and running routing protocols on the DNS servers, route management is done with static routes on the local router linked to service checks which verify the functionality of the DNS service.

Setting It All Up

In this example, we'll use the RFC 3330 TEST-NET. The service address for the DNS service will be 192.0.2.253. This is the address that would be associated with a name server in a delegation for authoritative DNS service, or would be listed as the local recursive DNS server in a DHCP configuration or desktop network config. The network between the local router and the DNS servers will be numbered out of 192.0.2.0/28 (or 192.0.2.0 through 192.0.2.15). The server-facing side of the router will be 192.0.2.1, and that will be the default route for each of the DNS servers, which will be 192.0.2.10, 192.0.2.11 and 192.0.2.12. This network will be the administrative interfaces for the DNS servers.

Once the servers are reachable via their administrative addresses, make a clone of the loopback interface on all three servers. Configure the second loopback interface with the DNS service address.

On FreeBSD, the rc.conf entries for the network should look something like this:
defaultrouter="192.0.2.1"
cloned_interfaces="lo1"
ifconfig_em0="192.0.2.10 netmask 255.255.255.240"
ifconfig_lo1="192.0.2.253 netmask 255.255.255.255"
It's a little more difficult to represent the configuration under Linux since it's spread across several config files, but the above should give you a pretty good idea of where to start.

Once the network setup is finished, configure your DNS server software to listen to both the administrative address and the service address. So, on the first DNS server, it should listen to 192.0.2.10 and 192.0.2.253.

That's all that needs to be done on the servers. Note that doing this was far simpler than configuring the servers to run OSPF and automatically add and remove routes as the DNS service is started or stopped.

The last few steps need to be taken on the local router. The first of these is to configure the router to check up on the DNS service on each of the three servers and make sure it's running; this is where Cisco's IP SLA feature comes into play. Configure three service monitors, and then set up three "tracks" which will provide the link to the service monitors.
ip sla monitor 1
type dns target-addr www.example.ca name-server 192.0.2.10
timeout 500
frequency 1
ip sla monitor schedule 1 life forever start-time now
!
ip sla monitor 2
type dns target-addr www.example.ca name-server 192.0.2.11
timeout 500
frequency 1
ip sla monitor schedule 2 life forever start-time now
!
ip sla monitor 3
type dns target-addr www.example.ca name-server 192.0.2.12
timeout 500
frequency 1
ip sla monitor schedule 3 life forever start-time now
!
track 1 rtr 1
track 2 rtr 2
track 3 rtr 3
This sets up three IP SLA Monitors which repeatedly query the administrative address on each server for the A record www.example.ca. The DNS server must respond with an A record for the QNAME you use; if it is unable to respond, or responds with a different record type, the monitor fails. In the example above the monitor attempts the lookup every second (frequency) and fails if it doesn't receive a valid A record within 500ms (timeout). You may need to experiment with the timeout value, depending on how responsive your DNS servers are. If you find individual servers appear to be going out of service when the daemon is still operating fine you might have the timeout value set too low.

With the monitors in place, turn on CEF and then configure three static routes to the service address via each server's administrative address. The routes are linked to the service monitors using the track argument:
ip cef
!
ip route 192.0.2.253 255.255.255.255 192.0.2.10 track 1
ip route 192.0.2.253 255.255.255.255 192.0.2.11 track 2
ip route 192.0.2.253 255.255.255.255 192.0.2.12 track 3
And that should be it. DNS queries arriving at the external interface of the router bound for 192.0.2.253 should now be routed to one of the DNS servers behind it, with a fairly equal load distribution. Since the router is using a hashing algorithm to select routes the load distribution can't be perfect, but in practise I've found that it's incredibly even. The only likely reason to see an imbalance is if your DNS servers receive an unusually high percentage of their queries from just one or two source addresses.

It's important to point out that most of the cautions in the ISC technote, particularly in reference to zone transfers and TCP DNS, apply equally here. I highly recommend reviewing the ISC document before implementing this in production.

Of course, there is still one big downside to this particular method of load balancing: it's depedant on one particular vendor. I have not yet found a way to reproduce this configuration using non-Cisco routers. If anyone is aware of a similar feature available from other major routing vendors please let me know and I'll integrate instructions for those routers here.

Friday, January 16, 2009

Slouching Toward The Cloud

In wandering the Ether this afternoon I rediscovered a friend's blog and his latest post, Cloud computing is a sea change. Mark Mayo has a lot to say about how cloud computing is going to change the career path of systems administrators in every corner of the industry, and he references Hosting Apocalypse by Trevor Orsztynowicz, another excellent article on the subject.

There's definitely a change coming. Cloud computing has the potential to put us all out of work, or at least severely hamper our employability, if we don't keep on top of the changes and keep our skills up to date.. but that's been true of every shift in the industry since it came into being. Every time a new technology or shift in business practises comes along, individual sysadmins either adapt or restrict their potential employers. The difference with cloud computing is that it promises a lot of change all at once, where in the past we've mostly dealt with at most two or three new things to think about in a year.

I think there are some potential drags on the takeover by cloud computing that will slow the changes Mark and Trevor warn of, however.

A few times now we've been warned that the desktop computer is doomed, and that it's all going to be replaced by thin clients like the Sun Ray from Sun Microsystems, or more recently mostly-web-based Application Service Providers like Google Apps. Despite years of availability of thin clients, and cheap, easy access to more recent offerings like Google's, this hasn't happened. Individual users, and even some small organizations may have embraced web apps as free alternatives to expensive packages like Microsoft Office, but I'm not aware of any significant corporations that have gone down this road. The main reason? I think it has to do with control of data. Most companies just don't want to hand all their data over to someone else. In many cases, simple reluctance can become a statutory hurdle, particularly when you're talking about customer data and there's a national border between the user and the data store. I think this same reasoning will come into play with even stronger force when companies start considering putting their entire data centre in the hands of another company. The difference in who has access to your data between co-location and cloud computing is significant.

Additionally, I think the geography problem will keep cloud computing out of some sectors entirely. As I noted in the comments to Mark's article, the current architecture of choice for cloud computing products is the monolithic data centre. Having only a handful of large data centres around the world will keep the cloud from consuming CDNs like Akamai and keep it out of other sectors entirely where wide topographic or geographic distribution is required, and a large number of small data centres are used, like root or TLD DNS infrastructures.

Mark correctly points out that the geography problem will be solved in some ways as cloud computing becomes more ubiquitous and the big players grow even larger, and in others as the cloud providers become the CDNs. But until a cloud provider can guarantee my data won't leave my local legal jurisdiction I'd never recommend the service to whoever my employer happens to be... and even once they can I'd still recommend the lawyers have a good hard look at the liability of handing over all the company's data to another party.

Mark's core point remains valid however: change is coming. Whether it's fast and sudden, or slow and gradual, sysadmins had better be prepared to learn to deal with the cloud computing APIs, and be comfortable working with virtual machines and networks, or they'll be left behind.

Tuesday, January 13, 2009

Not Your Best Interview, Mr. Tovar

I've been thinking for some time about starting to regularly post thoughts about random things somewhere visible — blogging, if you will. This afternoon I was thinking about it somewhat more earnestly, trying to choose between several topics for a first post, when a friend pointed out this interview. Seeing as the subject (the DNS) is what I spend most of my days doing, it seemed like an excellent place to start.

First let me say that I disagree with Tom Tovar's basic position on open source DNS software. As a general class of software, there is absolutely nothing wrong with using it for mission critical applications — not even for critical infrastructure. The various arguments about "security through obscurity" vs. open analysis and vetting have been made so many times that I won't bother with the details here. Suffice to say that, all other things being equal, I'd be far more inclined to trust the security of my DNS to open source software than closed, proprietary commercial software. Not that his position is that big a surprise. After all, he is the CEO of a company that sells some extremely expensive DNS software.

Mr. Tovar's early statements about DNS security, Kaminsky, and the current concerns of the DNS industry as a whole are hard to argue with. However, as one gets further into the interview, some serious problems with what Mr. Tovar has to say crop up including one glaring error in a basic statement of fact. I'll approach those in turn as I work my way through the article, addressing flawed points in the order they come.

I'll approach one particular point here as a pair of answers, since there's tightly related information in both.
GCN: The fix that was issued for this vulnerability has been acknowledged as a stopgap. What are its strengths and weaknesses?

TOVAR: Even to say that it is pretty good is a scary proposition. There was a Russian security researcher who published an article within 24 hours of the release of the [User Datagram Protocol] source port randomization patch that was able to crack the fix in under 10 hours using two laptops. The strength of the patch is that it adds a probabilistic hurdle to an attack. The downside is it is a probabilistic defense and therefore a patient hacker with two laptops or a determined hacker with a data center can eventually overcome that defense. The danger of referring to it as a fix is that it allows administrators and owners of major networks to have a false sense of security.

GCN: Are there other problems in DNS that are as serious as this vulnerability?

TOVAR: I think there are a lot of others that are just as bad or worse. One of the challenges is that there is no notification mechanism in most DNS solutions, no gauntlet that the attacker has to run so that the administrator can see that some malicious code or individual is trying to access the server in an inappropriate way. If UDP source port randomization were overcome and the network owner or operator were running an open-source server, there would be no way to know that was happening. This has been a wake-up call for any network that is relying on open source for this function.
Mr Tovar's last point in the first question is right on the money: the patches released for dozens of DNS implementations this summer do not constitute "a fix." Collectively they are an improvement in the use of the protocol that makes a security problem tougher for the bad guys to crack, but does not make that problem go away.

As for the rest of what he has to say here, on the surface it seems perfectly reasonable, if you make a couple of assumptions.

First, you have to assume that commercial DNS software, or at least his commercial DNS software, has some special sauce which prevents (or at least identifies) Kaminsky-style attacks against a DNS server, which cannot be available to operators of open source DNS software. It may be reasonable to take the position that it is beneficial for DNS software to notice and identify when it is under attack; however, it is not reasonable to suggest that this is the only way to detect attacks against the software. When the details of Kaminsky's exploit were eventually leaked, almost immediately tools began to spring up to help operators detect poisoning attempts, such as the Snort rule discussed in this Sourcefire Vulnerability Research Team white paper [PDF].

The second assumption you must make to consider this point reasonable is that a network operator is going to fail to notice a concerted effort to attack one or more of her DNS servers. Overcoming source port randomization, the "fix" under discussion, requires such a concerted effort — so much bandwidth consumed for so long a period — that it is generally considered unlikely that a competent network operator is going to fail to notice a cache poisoning attempt in progress when it is directed at a patched server. It's a bit like suggesting that it takes 10 hours of someone banging on your front door with a sledgehammer to break through it, and that it's unlikely you'll notice this going on.

This is the only reason informed DNS operators will allow anyone to come anywhere near implying that the patch is a "fix" of some kind. In fact, the only currently available fix for this vulnerability in the protocol is securing the protocol, which is where DNSSEC comes in.

GCN: Is open-source DNS software inherently less secure than proprietary software?

TOVAR: The challenge of an open-source solution is that you cannot put anything other than a probabilistic defense mechanism in open source. If you put deterministic protections in, you are going to make them widely available because it is open source, so you essentially give the hacker a road map on how to obviate or avoid those layers of protection. The whole point of open source is that it is open and its code is widely available. It offers the promise of ease of deployment, but it is likely having a complex lock system on your house and then handing out the keys.
There actually is a widely available deterministic defense mechanism which is implemented in open source, as well as commercial software. It's known as DNSSEC, and it comes up later in the interview. Putting that aside though, I'd be curious to know what other deterministic defense mechanisms Tovar is implying are available in commercial software, but are not available in open source software. Obviously he doesn't say; he can't. His own point addresses why non-revealable deterministic countermeasures are insufficient for securing DNS software: should they be revealed (and let's face it, corporate espionage happens) then they are immediately invalidated.

GCN: BIND, which is the most widely used DNS server, is open source. How safe are the latest versions of it?

TOVAR: For a lot of environments, it is perfectly suitable. But in any mission-critical network in the government sector, any financial institution, anything that has the specter of identity theft or impact on national security, I think using open source is just folly.
He is, of course, welcome to this opinion. I do not share this opinion, and I would bet real money that neither do a majority of the other operators who manage critical DNS infrastructure. Refuting the basis for this opinion is what this post is all about, so I'll move on.

GCN: Why is BIND so widely used if it should not be used in critical areas?

TOVAR: The Internet is still relatively young. We don’t think poorly of open source.
The implication being... as the Internet ages and we gain more experience, we should think more poorly of open source?
In fact, many of our engineers wrote BIND versions 8 and 9, so we do believe there is a role for that software. But the proliferation of DNS has occurred in the background as the Internet has exploded. DNS commonly is thought of as just a translation protocol for browsing behavior, and that belies the complexity of the networks that DNS supports. Everything from e-mail to [voice over IP] to anything IP hits the DNS multiple times. Security applications access it, anti-spam applications access it, firewalls access it. When administrators are building out networks it is very easy to think of DNS as a background technology that you just throw in and then go on to think about the applications.
The rest of this statement is about how people deploy DNS, not about how the software is designed or works. Nothing here explains why he feels open source DNS software can't be used in mission critical situations. All he explains is how he thinks it can be deployed in sloppy ways. Commercial software can be deployed sloppily just as easily as open source software. Nothing inherent in the software prevents or aids the lazy administrator in being lazy.

GCN: Why is DNSsec not more widely deployed?

TOVAR: Not all DNS implementations support it today, and getting vendors to upgrade their systems is critical. Just signing one side of the equation, the authoritative side, is a good step, but it’s only half the battle. You need to have both sides of the DNS traffic signed. And there is no centralized authority for .com. It is a widely distributed database and you have to have every DNS server speaking the same version of DNSsec. So the obstacle to DNSsec deployment is fairly huge. It is going to take government intervention and wide-scale industry acknowledgment that this is needed.
And here, at the end, is where I think Mr Tovar shows his ignorance of DNS operations in general, and in particular the operation of critical infrastructure. There is, in fact, a central authority for the .COM zone. That authority is Verisign; they manage and publish the .COM zone from one central database. Mr Tovar is, perhaps, thinking of the .COM WHOIS database. The WHOIS database holds the details of who holds each registered domain, and is indeed distributed among the .COM registrars.

This sort of fundamental misunderstanding of the DNS infrastructure is a good indicator of just how much weight should be given to Mr. Tovar's opinions on how that infrastructure is managed.

UPDATE: Florian Weimer raises an interesting question, one that I'm embarassed to say didn't occur to me when I was originally writing this. Even though the question itself is facetious, the point is valid. The data sheets [PDF] for the main product that Mr. Tovar's company markets say that it only runs on open source operating systems. How can he keep a straight face while on the one hand claiming that his product is suitable for mission critical applications but open source isn't, while on the other hand his software only runs in open source environments? It's quite baffling.

UPDATE UPDATE: At some point the dns-operations archive index got regenerated and the link to Florian Weimer's email ended up pointing to the wrong post. Fixed 2025/06/03.