“We’re hoping to find out what scientific domains already work well in the cloud, which domains can work with modifications and what won’t work at all,” says Argonne’s Susan Coghlan.

Feb 08 2011

Private Sky

DOE labs use a test bed to perfect cloud services for high-end research.

At Energy Department labs outside Chicago and San Francisco, researchers are sequencing genomes, analyzing microbial populations in soil samples, exploring carbon sequestration and modeling the spread of pandemics — among scores of other projects. Just a typical day at the labs, with one large exception: They’re conducting these projects using a cloud computing test bed called the Magellan Project.

Argonne National Laboratory and Lawrence Berkeley National Laboratory designed Magellan to test the feasibility of cloud computing for computational science — an experiment in experimentation, if you will, that could ultimately change how science is performed across the globe.

Funded by the American Recovery and Reinvestment Act, the two-year project is slated to run through September 2011, says Susan Coghlan, who’s heading up Magellan for Argonne in Illinois.

“We’re hoping to find out what scientific domains already work well in the cloud, which domains can work with modifications and what won’t work at all,” says Coghlan, associate division director for the Argonne Leadership Computing Facility. “We’re trying to answer the questions: Is the cloud computing paradigm a reasonable one for doing DOE high-performance computing? If not, why not? If so, what path should be taken to bring it up to the level where it needs to be?”

No Guts, No Glory

At Argonne, the Magellan team has amassed a private cloud consisting of more than 4,000 Intel Nehalem computing cores, with 40 terabytes of solid-state drive storage, 133 graphics servers and 15 dedicated memory servers, connected via a high-speed QDR InfiniBand communications link. Berkeley’s National Energy Research Scientific Computing Center (NERSC) in California has constructed a similar cloud environment.

Yet at these facilities, where high-performance computing typically involves multimillion-dollar supercomputers cranking at petaflops (quadrillions of floating point operations per second), this cluster is considered a sporty economy model, operating at only 151 teraflops (trillions of FLOPS).

And that’s a big part of its appeal. The Magellan clusters aren’t intended to compete with the supercomputers: They’re designed to find out what could happen when midrange computing is widely available to the lab-coated masses, Coghlan says.

Because cloud computing environments can be provisioned quickly and accessed from virtually anywhere, this infrastructure could expand access to raw computing power to more scientists when they need it, without massive upfront investments in expensive hardware or queuing up for limited time on the big iron, says Katherine Yelick, division director of NERSC.

“We have more people who want time on our high-performance computers than can actually get it, so for the most part those facilities are reserved for large, high-end scientific problems,” Yelick says. “A centralized resource like the cloud is best suited for researchers with very spiky workloads — say, access to 500 computer nodes for 24 hours, once a month. That can be very expensive to do on your own.”

Plus, notes Coghlan, most scientists working out in the field don’t have the funds to procure large systems, “let alone the ability to pay for administration, support, power and cooling. If they can take advantage of the cloud when they need to run large problems, that opens up a world that might not be available to them if they had to plop down $2 million for a computer system.”

Be Selective

Besides the cost savings, cloud computing also provides a different kind of speed — the swiftness with which lab personnel can provision a server share for a user. For example, when the DOE’s Joint Genome Institute had a sudden need to double its computing resources last March, it turned to Magellan. Within three days, researchers had gained the services of a cluster of several hundred nodes identical to JGI’s local computing cluster.

Yet, as Magellan researchers have learned, not all scientific applications benefit from a cloud environment. Experiments that can be sliced into multiple parts and run independently, such as sequencing strands of DNA, work very well in the cloud, Coghlan says.

But computing jobs that need to run in parallel, communicate frequently or synchronize among different nodes are a poor choice, says Yelick. For one thing, the cloud nodes are likely to be virtualized, which means each physical machine may be running an unknown number of other processes and applications. Some may run more slowly than others, which means the entire test system has to wait for the slowest machine to catch up.

“Faster interconnects and synchronization between processors are what make supercomputers super,” explains Jared Wilkening, a software developer at Argonne who worked on the JGI cloud project. “Cloud computing is better for more CPU-intensive tasks.”

In addition, high-end scientific applications such as climate simulation also require higher rates of data input and output than cloud configurations can muster.

“Data transfer and movement are key things to think about before you move to the cloud,” Wilkening says. “You don’t want to have your cloud nodes spending 90 percent of their time getting the data they need in and out. That wastes a lot of what you’re paying for.”

Making the Right Paradigm Shift

The cost benefits of cloud computing can be compelling, especially as commercial providers begin to roll out cloud offerings more suitable for the scientific community, says Ron Ritchey, a principal with strategy and technology

consulting firm Booz Allen Hamilton. The lessons learned from Magellan could eventually lead to the cloud becoming as common as test tubes and telescopes in a scientist’s arsenal.

“There are lots of opportunities in academia for a cloud system to be applied that not only reduces cost but makes the setup of the experiment much more efficient and attractive,” Ritchey says. “If you have an experiment that would benefit from having a few hundred CPUs chew on the same data for an hour, and you put in a purchase order for them to perform that experiment, you’d probably get turned down. But doing the same thing in Amazon’s EC2 environment would cost maybe $25.”

Just as inexpensive Internet access has sped the flow of information, cloud computing could spread the reach of science.

“To me, what’s important about the cloud is that it lets scientists have access to computing resources when they really need them, instead of parceling them out by a certain number of hours of use per year,” Yelick says. “Hopefully the expertise we gain from Magellan will influence what commercial providers, NERSC and others offer in terms of cloud services for their users.”




<p>Photo: Bob Stefko</p>