ACM Queue - Beyond Beowulf Clusters: As clusters grow in size and complexity, it becomes harder and harder to manage their configurations.

Fri, Jun 29, 2007

Software Test
& Performance

ALM

Developer Tools

Security

Development Tools Directory

Columns:

Curmudgeon

Geek@Home

Interviews

Kode Vicious

Conference Calendar

Issue Index

Site Map

CRC Career Resource Center
· Software Developer/ R&D
· Computer Instructor
· Director of Web Communications
· As. and Associate Professors of Computer Science
· Technical Writer/ Editor

Hardware & Systems -> Features -> DNS issue

Beyond Beowulf Clusters

by Philip Papadopoulos, Greg Bruno, Mason Katz - University Of California, San Diego

printer-friendly format

recommend to a colleague

In fact, we advocate wholesale reinstallation of any particular node, as opposed to patch management, for managing any system.

NOW

Queue Digital Edition

The PDF version of the May/June issue of Queue is now online.
Download here
Only subscribers are able to download the Queue PDF edition. Activate your account here

Sections

1: NOW

2: CAMERA

3: Rocks

4: Configuration

more Hardware & Systems
- Repurposing Consumer Hardware New uses for small form-factor, low-power machines
- A Conversation with Matthew Papakipos, CTO, Peakstream The transition to many-core and hybrid-core computing: Is your software ready?
- Software Operations' Profit Potential - Transcript Transcript of interview with Macrovision FLEXnet Publisher Product Manager Mitesh Pancholy and Abby Domini

more Features
- Toward a Commodity Enterprise Middleware Can AMQP enable a new era in messaging middleware? A look inside standards-based messaging with AMQP
- The Seven Deadly Sins of Linux Security Avoid these common security risks like the devil
- DNS Complexity A journey into the sublime complexity of the domain name system: Although it contains just a few simple rules, DNS has grown into an enormously complex system.

In the early '90s, the Berkeley NOW (Network of Workstations) Project under David Culler posited that groups of less capable machines (running SunOS) could be used to solve scientific and other computing problems at a fraction of the cost of larger computers. In 1994, Donald Becker and Thomas Sterling worked to drive the costs even lower by adopting the then-fledgling Linux operating system to build Beowulf clusters at NASA's Goddard Space Flight Center. By tying desktop machines together with open source tools such as PVM (Parallel Virtual Machine), MPI (Message Passing Interface), and PBS (Portable Batch System), early clusters - which were often PC towers stacked on metal shelves with a nest of wires interconnecting them - fundamentally altered the balance of scientific computing. Before these first clusters appeared, distributed/parallel computing was prevalent at only a few computing centers, national laboratories, and a very few university departments. Since the introduction of clusters, distributed computing is now, literally, everywhere.

There were, however, ugly realities about clusters. The lack of tools meant that building 16 or 32 machines to work closely together was a heroic systems effort. Open source software was (and often still is) poorly documented and lacked critical functionality that more mature commercial products offered on the "big machines." It often took months to get a cluster up and running and took highly trained experts to get it into that condition. It took even longer for applications to run reasonably well on these cheaper machines, if at all.

Nonetheless, the potential of building scalable and cheap computing was too great to be ignored, and the community as a whole grew more sophisticated until clusters became the dominant architecture in high-performance computing. Midsize clusters are now about 100 machines in strength, big clusters consist of 1,000 machines, and the biggest supercomputers are even larger cluster machines. For HPC (high-performance computing), clusters have arrived. Most of these are either Linux-based or a commercial Unix derivative, with most of the top 500 machines running a Linux derivative. There is a new trend toward better hardware integration in terms of blades. This helps eliminate significant wiring clutter.

The past 12 years of clusters have honed community experience - many can turn out "MPI boxes" (homogeneous hardware that enables message-passing parallel applications), and there are several software tools that understand clusters and allow non-experts to go from bare metal (e.g., that cluster SKU from your favorite computing hardware company) to functioning cluster made up of hundreds of individual nodes (computers) in a few hours. At the National Center for Supercomputing Applications, the Tungsten2 Cluster (512 nodes) went from purchase order placement to full production in less than a month and was one of the 50 fastest supercomputers in the world in June 2005.

It seems that the problems with clusters have been solved, but their wild success means that everyone wants to do more with them. While retaining roots born in HPC, clusters of Web servers, tiled display walls, database servers, and file servers are becoming commonplace. Nearly every entity in the modern machine room is essentially a clustered architecture. Building a specialized MPI box (classic Beowulf cluster) is a small subset of what is needed to support the needs of computational researchers.

Early clusters were tractable for experts because all the hardware was identical (purchased that way to simplify things) and every node in the cluster ran only a single software stack. To build these machines, the expert carefully created a "model" node through painstaking, handcrafted system administration. Then he or she took an image of this model and cloned bits onto all the other nodes. When changes were needed, a new model was built, and the nodes were recloned. The community learned that for clustered applications to function properly, software must be consistent across all nodes. A variety of mechanisms and underlying assumptions have been used to achieve consistency with varying degrees of success. Consistency itself isn't intractable, but achieving it becomes significantly more important as the complexity of the infrastructure increases.

The reason for the complexity of the modern machine room isn't just that all nodes are part of a particular logical cluster, it's that every one of the clusters needs a different software configuration. In Rocks, software that we have developed to provision and manage clusters rapidly, we call these configurations appliances. If every appliance has its own "handcrafted" model, the cost of the cluster goes up dramatically, uniformity of security policy across clusters is tied to the ability of the cluster experts to apply changes uniformly to all model nodes, and each (sub)cluster must use relatively identical hardware. This administrator-heavy model leads to inconsistency in the enterprise and relies too much on human "wetware," when what is needed is a programmatic approach.

next page (2/4)
CAMERA

ACM Queue vol. 5, no. 3 - April 2007
by Philip Papadopoulos, Greg Bruno, Mason Katz - University Of California, San Diego

Submit this story to one of the following blogs:

Related Stories

Advice to a Newbie
- Do You Remember Your First Time? Pure coding is but one piece of the software development landscape.
Kode Vicious

What's on Your Hard Drive?
- DTrace, Subversion, Visual Studio, EditPlus

What's on Your Hard Drive? I love Eclipse. I hate Eclipse.
- emacs, VS, TCC, vi, flex, JavaScript...

Discuss Beyond Beowulf Clusters

Be the first to comment on this article.

Post your comment now!

name:
email:
subject:
comment:
	note: only <b>, <i>, and <br> tags allowed
	Please type in the captcha number below

Queue Partners

Place Your Link Here

AllinfoDir Web Directory Apartments for rent Counter Strike Hacks Credit Cards Elegant Directory

Web development & buy MLB tickets.

Free Themes online blackjack osCommerce Modules & Mods phone cards WoW Hacks

Free QueueNews Email Newsletter
QueueNews is a weekly newsletter featuring a listing and excerpts of the latest articles to appear on Queue's Web site. Subscribing is quick and easy! Just fill out the form below.
	- HTML version - plaintext version
Please type in the captcha number:
privacy policy

ACM Home	About Queue	Advertise with Queue	Advisory Board	Back Issues	Contact Us	Dev Tools Roadmap	Free Subscription	Privacy Policy	Writer Faq	RSS feeds
ACM Home	© ACM, Inc. All rights reserved.

Place Your Link Here
AllinfoDir Web Directory	Apartments for rent	Counter Strike Hacks	Credit Cards	Elegant Directory
Web development & buy MLB tickets.
Free Themes	online blackjack	osCommerce Modules & Mods	phone cards	WoW Hacks