The capacity-computing clusters at the National Nuclear Security Administration are 10 times more cost-efficient than the supercomputers for running smaller models and simulations, says NNSA’s Dimitri Kusnezov.
Feb 26 2008

At Capacity

Clusters of servers — sometimes thousands in a grid — let agencies tap computing power that once was the exclusive domain of supercomputers.

Government scientists and researchers have long enjoyed access to some of the most powerful computing platforms in the world, but they pay a premium for the privilege beyond the initial purchase price. These computational powerhouses require an extraordinary amount of ongoing management and maintenance. So it’s no surprise to see many agencies embracing a capacity-computing platform that not only boasts a lower cost of ownership for high-performance computing but also lets them assign more-intensive number-crunching problems to the systems where they’re best served.

At the National Nuclear Security Administration (NNSA), for example, it was common for thousands of users to run scientific calculations simultaneously on a single supercomputer. This wasn’t a problem until there was the need — but not the space — to run a single mammoth simulation that required the lion’s share of that supercomputer’s computational resources.

With the Energy Department’s recent multimillion-dollar award of a contract for at least eight scalable Linux capacity clusters for its three national defense laboratories — Los Alamos, Sandia and Livermore — the competition for shared computing resources will diminish, perhaps even disappear. The contract includes half a dozen vendors, including AMD, Supermicro and NVidia.

At many agencies, a similar clustering approach can fit the bill. Think of capacity computing as egalitarian: Computational resources are sliced and diced for large numbers of users, giving access to resources to many different players as needed. Typically it falls under the umbrella of high-performance computing but can work in environments where many users need random access to computing power. Capacity-computing platforms can have hundreds but usually not more than a couple of thousand processors.

Capacity computing is now an option for more agencies for three reasons. First, it offers a way for agencies to increase processing capability while reducing costs. Second, agencies can create these systems using off-the-shelf technology. Lastly, the latest developments for blade servers, virtualization applications and microprocessors make the transition to capacity computing easier to manage.

Server clusters — server components linked together with high-speed networks — is what turbo-charged the capacity-computing market sometime after 2000, according to Steve Conway, research vice president of high-performance computing at IDC. “Based on commodity component technology, clusters are driving pervasiveness and growth of the market,” he says.

At NNSA, new computing clusters will provide capacity computing, running large numbers of smaller jobs simultaneously on a high-performance machine. The agency’s more powerful supercomputers will be dedicated to larger, more complex calculations critical to the nation’s Stockpile Stewardship Program, which develops predictive computer modeling tools to gauge performance of weapons in the U.S. nuclear arsenal.

“Our capacity-computing initiative will make our work more cost-effective and productive,” says Dimitri Kusnezov, director of advanced simulation and computing at NNSA. Plus, standardizing across the laboratories will facilitate collaboration and reduce redundant maintenance costs, he points out.

Such a Deal

A big attraction is the low price point. About a decade ago, similar computing capability carried a starting price tag upward of $10 million. Today, prices start at $15,000.

Capacity computing in the government is one of the fastest growing parts of the high-performance

computing market. A $10 billion market in 2006, it’s expected to reach $15.5 billion by 2011.


Kusnezov reports that the new capacity clusters at NNSA are 10 times more cost-effective than the agency’s traditional supercomputers. The cost per teraflop will be less than $70.

Both the National Cancer Institute’s Advanced BioMedical Computing Center (ABCC) and the National Oceanic and Atmospheric Administration (NOAA) have capacity-computing initiatives, as well.

A resource for biomedical research, ABCC’s capacity-computing platform helps scientists find solutions to data-intensive, computational biological problems. “In our environment, we’re concerned about running many tasks as quickly as possible so our scientists can get on with their research,” says Jack Collins, manager of scientific computation and program development at the center for ABCC contractor Science Applications International Corp.

The sheer amount of data points per patient in a study that may include thousands of patients can be staggering. And those numbers are growing daily. Take, for example, recent findings for a fully sequenced DNA that produces 3 billion data points per person.

The development of high-throughput capacity clusters lets biomedical scientists at ABCC generate 5 terabytes of data in a week, compared with just a few megabytes of data over several weeks using less efficient machines. “Capacity computing allows us to generate data in hours rather than months providing a return on value from capacity computing in orders of magnitude,” says Collins. Vendors such as Hewlett-Packard, IBM and Sun Microsystems, among others, provide server resources at ABCC. Scientists at NOAA use the agency’s capacity-computing cluster primarily for creating numerical prediction models for weather, wave and oceanographic forecasts, running many jobs simultaneously.

The agency has two capacity clusters based on IBM’s Power Architecture residing at different locations to accommodate fail-over, if necessary.

“Capacity-computing technology has gotten to the point where we can combine off-the-shelf components for high-performance computing systems,” says David Michaud, program manager in NOAA’s Office of the CIO.

IBM’s Dave Turek, vice president of deep computing, would agree, noting that the dramatic change over the past few years toward Linux as an operating system, coupled with low-cost x86 processing nodes, has let server manufacturers develop capacity systems with greater capability at less cost.

Looking Ahead

• Capacity Computing: Using smaller and less expensive clusters of systems to run parallel problems requiring modest computational power
• Capability Computing:

Using the most powerful supercomputers to solve the largest and most demanding problems with the intent to minimize time-to-solution
• Advanced Systems:

Cost-effective computers designed to achieve extreme speeds in addressing specific stockpile issues


Although clusters have had the most significant impact on capacity computing — there’s been a sevenfold increase in the number of processors in a cluster, from 683 in 2004 to almost 5,000 today — other technology trends also are shaping how agencies use capacity-computing technology.

ABCC, for example, is moving toward blade clusters across a grid. From an institutional perspective, blades are cost-effective, provide excellent performance, make management easier, save space and power, require less cabling and scale easily. IBM BladeCenter solutions include both Power- and x86-based Blade platforms.

At the same time, packing more technology into the same or a smaller footprint doesn’t fully address one unresolved challenge of capacity-computing clusters: regulating temperature.

But a more recent technology evolution — server virtualization, or the partitioning of one physical server into multiple, isolated logical environments — can benefit capacity computing by improving operational efficiencies, increasing flexibility and lowering costs.

Microprocessor speed is perhaps the biggest technology challenge for processing-power-hungry government scientists and researchers. Whereas microprocessor speed used to double every 18 to 24 months, speed improvements in individual microprocessors has hit a speed bump over the past several years, for all practical purposes remaining flat.

Unable to squeeze more speed out of a single microprocessor, manufacturers cleverly introduced multicore technology, taking slower-speed individual processors and putting more on a chip to produce greater speeds.

At NNSA, capacity-computing clusters will include AMD’s quad-core Barcelona processors. The untold story about multicore processors, however, is how to program them to take advantage of the technology. “It’s a case of the hardware guys being ahead of where the software guys are,” says Turek.

What that means, according to IDC’s Conway, is that 80 percent of the application software used for technical computing in the fields of science, engineering and government can’t utilize more than 32 processors.

Still, today’s more powerful and affordable capacity-computing clusters are enabling agencies to rethink how they buy, manage and view their computational resources.

Photo: Joshua Roberts