|
Online
availability
and denial of service |
Or:
Today, anyone that's ever
typed in a password is a security expert.
Recent
attacks on leading sites have made everyone, from the CIO to
the casual investor, take notice of server availability. Like every
consultant worth its salt, we've been in the thick of things. We
wanted to share some of our findings and observations with you,
our readers, so we've broken this month's newsletter into three
categories: A handy analogy you can use to explain this to less
technical folks; some musings on what'll happen next, and information
from our labs.
(This
third one's the kind of thing we usually charge for, and it'll be
available in more detail in an upcoming Networkshop Briefing, but
our lab has been busy all weekend checking out server resistance
and we can't keep it to ourselves much longer.)
If
you're like us, you've been asked to explain what happened to any
number of people. DoS attacks take three forms: poisoned attacks,
flood attacks, and state attacks. The first two are relatively simple.
A
poisoned attack consists of sending the target something it
can't easily digest because of the way it was designed. For example,
if you wanted to break an airport conveyor belt you could simply
put an oversized package on it, blocking the belt because of the
way it was designed.
A
resource attack comes in two forms: a network flood attack
is simply a brute-force saturation of the target's network, and
it relies on the attacker having more bandwidth than the receiver.
Since hackers are seldom as well funded as large portal sites,
this is generally done by having access to many big systems (such
as university servers) and synchronizing their generation of traffic
to overwhelm the target.
A
state attack -- the most well-known being the TCP SYN attack
-- relies on exploiting some of the rules by which networks operate.
In geek-speak, the client generates a high number of embryonic
TCP connection states on the target, and never responds to the
TCP ACKs it sends. The server waits a particular length of time
for the attacker to respond, and the work of tracking all of these
states is too much for the server's processor. There are other,
less well-known state attacks, but hackers particularly like the
SYN attack because it doesn't require a valid source address,
thereby hiding the hacker's identity.
I was
stuck in front of TV cameras trying to dumb this down for the evening
news. It went well enough, but I mused over what I could have said
for some time. So here's an analogy that's pretty accurate and extends
fairly well in case you're faced with a similar dilemma.
Each
of the companies that was targeted is pretty immune to poison attacks
because their network staff knows what it's doing. They also has
a huge amount of bandwidth, and lots of processing capacity. This
is rather like a big company with thousands of phone lines and thousands
of employees. Hackers have two challenges when they want to take
down such a company: they don't have enough capacity on their own,
and they need to remain anonymous.
Web
connections function rather like corporate phone calls. Before you
reach the person you're calling, you chat with the equivalent of
a receptionist who asks who you are and then forwards your call.
So how can a hacker with one phone line tie up a receptionist with
a thousand? Essentially, he calls, and when the receptionist picks
up the phone and asks for the caller's name, he hangs up. On computers,
the "receptionist" has to wait for a very long time --
in some cases, several minutes, before knowing that the call went
away. So the receptionist puts the call on hold for three minutes.
Meanwhile,
the hacker calls back. On a second phone line, he repeats the process.
The line is put on hold. This continues a thousand times -- and
with the speed of networks, it's relatively easy to connect a thousand
times in three minutes. Essentially, the company's phonelines are
all tied up waiting for disappeared callers.
The
analogy works because it suggests ways of fixing it:
First,
the receptionist could take a number and call the person back
before putting them on hold. In this way, the receptionist would
know the caller's number; if the caller gave a bogus number, the
receptionist would know they were fake. This is analagous to a
SYN Cookie, a technique that works well on paper but in practise
ties the receptionist up calling people back.
Second,
the system could wait for an answer for a shorter amount of time
-- say, three seconds. This might cause problems if legitimate
callers took a long time to answer when the receptionist asked
for their name. To some degree, such callers would call back.
This is analagous to adjusting TCP timeout values.
Third,
add more phone lines. While it may be impractical for a server
to handle many pending connections, network equipment like load-balancers
is designed for this kind of work. A basic Linux install handles
a meagre 128 enbryonic TCP states, and is ripe for tuning.
Fourth,
have a two-tiered receptionist model in which the first receptionist
checked to see if a caller was legitimate before sending them
on to the second one. We know this as load-balancing equipment
executing delayed binds.
Fifth,
when all lines are busy, clean out the people that have been on
hold the longest. This is called connection reaping, and it's
a feature of many servers and load-balancers. It's especially
useful when the load-balancer sends a TCP reset to the server
to free up the state.
Sixth,
to get the attack to stop the company has to call the telephone
company and ask them to block certain calls. Since these calls
cannot be traced, the phone company has to work its way back through
the network, switch by switch, wire by wire, until it finds a
sender. This takes a lot of work, in some cases overwhelming the
phone company. This is analagous to asking your ISP to put its
routers into a traffic analysis mode, logging packets and identifying
the ingress port, then calling its upstream ISP.
Seventh,
if you're a responsible phone company you don't let people mess
around with their caller ID. This is analagous to egress filtering
on access routers to prevent spoofed IP addresses.
The
problem with all of these is that they don't really help legitimate
users. While they may protect the company, valid callers still have
to contend with the busy receptionists.
What'll
happen next
Of course, like all analysts, it's not enough to explain things.
Here are some predictions about how this furor will affect product
roadmaps -- and vendor marketing -- in the weeks to come.
What's
perhaps more interesting is the kinds of attacks and the work that
went on beforehand. By all accounts, these attacks were targeted
at specific weaknesses that each site showed -- which meant they'd
been scanned and tested gently beforehand. Expect intrusion detection
tools to ship improved offerings that watch for these kinds of scans.
In our load-balancing tests, some vendors had excellent logging
and attack detection methods that can help identify potential miscreants.
At
the same time, TFN was effective because it enlisted the help of
infected servers to act as attackers. This poses all sorts of liability
issues: if company A was lax in its virus scanning efforts, and
a hacker used company A's machines to attack company B, then can
B sue A? You can bet that vendors of proxy- and mail-server scanning
applications will try and convince B to do so. Watch for lawsuits
about negligent computer practises that affect others.
Load-balancing
equipment started out with a performance spin: "aggregate multiple
servers and handle more traffic." Married to this was the availability
spin: "eliminate single points of failure." These have
recently been replaced with more advanced lines: "handle distributed
content," "shape user traffic according to buyer preference,"
"optimize reverse proxy cache through content awareness,"
and "make maintenance easier without interrupting service."
Now, watch for vendors to play up their security strengths and
survivability. Certainly, having a vendor tune their stack parameters
is easier than doing it yourself. Load-balancing vendors should
partner with virus scanning vendors for scheduled upgrade programs
that download the latest DoS "signatures" to their boxes
and take remedial action.
ISPs
have been pretty lazy about enforcing source addresses, partly because
putting access list rules on their routers to ensure people only
send traffic from their own machines consumes routing capacity,
and partly because some nifty load-balancing features -- such as
triangular routing paths and proximity detection -- take advantage
of source address spoofing. Watch for egress routing features
to become more common and ISPs to get more picky about forwarding
traffic whose source address isn't on your downstream network.
These
attack viruses had to get their marching orders from somewhere.
Consequently, watch for stateful firewalls to enhance their products
to watch for TFN and Trin00 "trigger" messages -- and
for hackers to hide these in SSL traffic, UDP fragments, and so
on. Many of the teamed systems use unsolicited ICMP responses to
give the "zombie" systems their marching orders. A really
good stateful firewall could match up outbound pings with inbound
responses to stifle these instructions, but then the hackers would
find another way of telling zombie machines to start an attack.
In other words, get ready to start subscribing to stateful firewall
DoS protection services the way you license virus scanning software.
In
our labs
We ran some quick tests in our lab this weekend, pointing our Litmus
test suite at a couple of big machines, then nailing it with SYN
attacks. On an Ethernet segment, we killed the machine within one
second of the attack for a "naked" Linux configuration.
With proper tuning of some TCP stacks, we were able to sustain an
attack and degraded gradually as the number of SYNs per second increased:
We're working on more detailed information, such as the difference
between different server platforms and tuning parameters and the
client latency that each tuning parameter will take. As I wing West
to the Intel Developer Conference (and it certainly seems like these
get written when I'm on cross-country flights, despite the fundamental
incompatibilities of Photoshop and a Toshiba notebook mouse...)
John's team in our Montreal lab is benchmarking and testing so we
can bring you the latest real-world knowledge on these attacks and
how to beat them.
For
now, suffice to say that in many cases, a load-balancer will offer
a degree of protection -- and that monitoring is the only way to
detect these attacks and intervene with tuning, devices, and upstream
ISP investigation.
Next
month
We had a busy Christmas, finishing off the load-balancing study
and helping a number of companies with their e-business infrastructure,
but ISPCon Canada and the upcoming Intel conference will be a source
of plenty of new topics. We're flattered that readers noticed the
absence of last month's issue, and we'll try not to do it again.
Some of the coming
months' topics include:
- Wireless
portability: Proximity, ergonomics, context-of-use analysis, extending
our senses, and the effects of miniaturization on human behavior
-
Are we solving the have-not problem ourselves? Moore's Law lowers
the bar and makes things obsolete, which may level the technology
playing field.
-
Are expectations driving us? The bank machine teaches impatience.
-
Mohammed and the Mountain: With processors everywhere and applications
centralized, we carry the GUI -- so is Handspring the ideal clip-on
UI?
OTHER
NEWS
The Networkshop report on
load-balancing and high availability systems is complete.
This is the most comprehensive, hands-on look at load-balancing
systems we -- or our customers -- have ever heard of, and it covers
new entrants (such as Phobos' IN-Balance) and architectures (such
as Cisco's MNLB.) Weighing in at over 250 pages, it's the result
of 6 months' hard work in our labs. The report is available for
sale on its own or as part of a consulting package that includes
onsite availability auditing and training. For more information,
contact us directly.
RECENT
PRESENTATIONS ONLINE
Case
studies in e-commerce, from this February's Internet
World Canada/ISPCon session.
Infrastructure from
online e-business systems, from the Intel Developer's
Forum in Palm Springs.
|