Networkshop

Online availability
and denial of service

Or: Today, anyone that's ever
typed in a password is a security expert.

Recent attacks on leading sites have made everyone, from the CIO to the casual investor, take notice of server availability. Like every consultant worth its salt, we've been in the thick of things. We wanted to share some of our findings and observations with you, our readers, so we've broken this month's newsletter into three categories: A handy analogy you can use to explain this to less technical folks; some musings on what'll happen next, and information from our labs.

(This third one's the kind of thing we usually charge for, and it'll be available in more detail in an upcoming Networkshop Briefing, but our lab has been busy all weekend checking out server resistance and we can't keep it to ourselves much longer.)

If you're like us, you've been asked to explain what happened to any number of people. DoS attacks take three forms: poisoned attacks, flood attacks, and state attacks. The first two are relatively simple.

A poisoned attack consists of sending the target something it can't easily digest because of the way it was designed. For example, if you wanted to break an airport conveyor belt you could simply put an oversized package on it, blocking the belt because of the way it was designed.

A resource attack comes in two forms: a network flood attack is simply a brute-force saturation of the target's network, and it relies on the attacker having more bandwidth than the receiver. Since hackers are seldom as well funded as large portal sites, this is generally done by having access to many big systems (such as university servers) and synchronizing their generation of traffic to overwhelm the target.

A state attack -- the most well-known being the TCP SYN attack -- relies on exploiting some of the rules by which networks operate. In geek-speak, the client generates a high number of embryonic TCP connection states on the target, and never responds to the TCP ACKs it sends. The server waits a particular length of time for the attacker to respond, and the work of tracking all of these states is too much for the server's processor. There are other, less well-known state attacks, but hackers particularly like the SYN attack because it doesn't require a valid source address, thereby hiding the hacker's identity.

I was stuck in front of TV cameras trying to dumb this down for the evening news. It went well enough, but I mused over what I could have said for some time. So here's an analogy that's pretty accurate and extends fairly well in case you're faced with a similar dilemma.

Each of the companies that was targeted is pretty immune to poison attacks because their network staff knows what it's doing. They also has a huge amount of bandwidth, and lots of processing capacity. This is rather like a big company with thousands of phone lines and thousands of employees. Hackers have two challenges when they want to take down such a company: they don't have enough capacity on their own, and they need to remain anonymous.

Web connections function rather like corporate phone calls. Before you reach the person you're calling, you chat with the equivalent of a receptionist who asks who you are and then forwards your call. So how can a hacker with one phone line tie up a receptionist with a thousand? Essentially, he calls, and when the receptionist picks up the phone and asks for the caller's name, he hangs up. On computers, the "receptionist" has to wait for a very long time -- in some cases, several minutes, before knowing that the call went away. So the receptionist puts the call on hold for three minutes.

Meanwhile, the hacker calls back. On a second phone line, he repeats the process. The line is put on hold. This continues a thousand times -- and with the speed of networks, it's relatively easy to connect a thousand times in three minutes. Essentially, the company's phonelines are all tied up waiting for disappeared callers.

The analogy works because it suggests ways of fixing it:

First, the receptionist could take a number and call the person back before putting them on hold. In this way, the receptionist would know the caller's number; if the caller gave a bogus number, the receptionist would know they were fake. This is analagous to a SYN Cookie, a technique that works well on paper but in practise ties the receptionist up calling people back.

Second, the system could wait for an answer for a shorter amount of time -- say, three seconds. This might cause problems if legitimate callers took a long time to answer when the receptionist asked for their name. To some degree, such callers would call back. This is analagous to adjusting TCP timeout values.

Third, add more phone lines. While it may be impractical for a server to handle many pending connections, network equipment like load-balancers is designed for this kind of work. A basic Linux install handles a meagre 128 enbryonic TCP states, and is ripe for tuning.

Fourth, have a two-tiered receptionist model in which the first receptionist checked to see if a caller was legitimate before sending them on to the second one. We know this as load-balancing equipment executing delayed binds.

Fifth, when all lines are busy, clean out the people that have been on hold the longest. This is called connection reaping, and it's a feature of many servers and load-balancers. It's especially useful when the load-balancer sends a TCP reset to the server to free up the state.

Sixth, to get the attack to stop the company has to call the telephone company and ask them to block certain calls. Since these calls cannot be traced, the phone company has to work its way back through the network, switch by switch, wire by wire, until it finds a sender. This takes a lot of work, in some cases overwhelming the phone company. This is analagous to asking your ISP to put its routers into a traffic analysis mode, logging packets and identifying the ingress port, then calling its upstream ISP.

Seventh, if you're a responsible phone company you don't let people mess around with their caller ID. This is analagous to egress filtering on access routers to prevent spoofed IP addresses.

The problem with all of these is that they don't really help legitimate users. While they may protect the company, valid callers still have to contend with the busy receptionists.

What'll happen next
Of course, like all analysts, it's not enough to explain things. Here are some predictions about how this furor will affect product roadmaps -- and vendor marketing -- in the weeks to come.

What's perhaps more interesting is the kinds of attacks and the work that went on beforehand. By all accounts, these attacks were targeted at specific weaknesses that each site showed -- which meant they'd been scanned and tested gently beforehand. Expect intrusion detection tools to ship improved offerings that watch for these kinds of scans. In our load-balancing tests, some vendors had excellent logging and attack detection methods that can help identify potential miscreants.

At the same time, TFN was effective because it enlisted the help of infected servers to act as attackers. This poses all sorts of liability issues: if company A was lax in its virus scanning efforts, and a hacker used company A's machines to attack company B, then can B sue A? You can bet that vendors of proxy- and mail-server scanning applications will try and convince B to do so. Watch for lawsuits about negligent computer practises that affect others.

Load-balancing equipment started out with a performance spin: "aggregate multiple servers and handle more traffic." Married to this was the availability spin: "eliminate single points of failure." These have recently been replaced with more advanced lines: "handle distributed content," "shape user traffic according to buyer preference," "optimize reverse proxy cache through content awareness," and "make maintenance easier without interrupting service." Now, watch for vendors to play up their security strengths and survivability. Certainly, having a vendor tune their stack parameters is easier than doing it yourself. Load-balancing vendors should partner with virus scanning vendors for scheduled upgrade programs that download the latest DoS "signatures" to their boxes and take remedial action.

ISPs have been pretty lazy about enforcing source addresses, partly because putting access list rules on their routers to ensure people only send traffic from their own machines consumes routing capacity, and partly because some nifty load-balancing features -- such as triangular routing paths and proximity detection -- take advantage of source address spoofing. Watch for egress routing features to become more common and ISPs to get more picky about forwarding traffic whose source address isn't on your downstream network.

These attack viruses had to get their marching orders from somewhere. Consequently, watch for stateful firewalls to enhance their products to watch for TFN and Trin00 "trigger" messages -- and for hackers to hide these in SSL traffic, UDP fragments, and so on. Many of the teamed systems use unsolicited ICMP responses to give the "zombie" systems their marching orders. A really good stateful firewall could match up outbound pings with inbound responses to stifle these instructions, but then the hackers would find another way of telling zombie machines to start an attack. In other words, get ready to start subscribing to stateful firewall DoS protection services the way you license virus scanning software.

In our labs
We ran some quick tests in our lab this weekend, pointing our Litmus test suite at a couple of big machines, then nailing it with SYN attacks. On an Ethernet segment, we killed the machine within one second of the attack for a "naked" Linux configuration. With proper tuning of some TCP stacks, we were able to sustain an attack and degraded gradually as the number of SYNs per second increased:

We're working on more detailed information, such as the difference between different server platforms and tuning parameters and the client latency that each tuning parameter will take. As I wing West to the Intel Developer Conference (and it certainly seems like these get written when I'm on cross-country flights, despite the fundamental incompatibilities of Photoshop and a Toshiba notebook mouse...) John's team in our Montreal lab is benchmarking and testing so we can bring you the latest real-world knowledge on these attacks and how to beat them.

For now, suffice to say that in many cases, a load-balancer will offer a degree of protection -- and that monitoring is the only way to detect these attacks and intervene with tuning, devices, and upstream ISP investigation.

Next month
We had a busy Christmas, finishing off the load-balancing study and helping a number of companies with their e-business infrastructure, but ISPCon Canada and the upcoming Intel conference will be a source of plenty of new topics. We're flattered that readers noticed the absence of last month's issue, and we'll try not to do it again. Some of the coming
months' topics include:

Wireless portability: Proximity, ergonomics, context-of-use analysis, extending our senses, and the effects of miniaturization on human behavior
Are we solving the have-not problem ourselves? Moore's Law lowers the bar and makes things obsolete, which may level the technology playing field.
Are expectations driving us? The bank machine teaches impatience.
Mohammed and the Mountain: With processors everywhere and applications centralized, we carry the GUI -- so is Handspring the ideal clip-on UI?

OTHER NEWS
The Networkshop report on load-balancing and high availability systems is complete. This is the most comprehensive, hands-on look at load-balancing systems we -- or our customers -- have ever heard of, and it covers new entrants (such as Phobos' IN-Balance) and architectures (such as Cisco's MNLB.) Weighing in at over 250 pages, it's the result of 6 months' hard work in our labs. The report is available for sale on its own or as part of a consulting package that includes onsite availability auditing and training. For more information, contact us directly.

RECENT PRESENTATIONS ONLINE

Case studies in e-commerce, from this February's Internet World Canada/ISPCon session.

Infrastructure from online e-business systems, from the Intel Developer's Forum in Palm Springs.

Denial-of-service is certainly a hot topic this month. We look at some handy analogies, some of the consequences of the recent hacker furor, and real-world numbers for tuning a server to handle embryonic TCP states better.