Why Exascale Computing Requires a Collaborative Partnership Between Industry and Government

Five challenges facing the government’s supercomputer strategy.

Supercomputing is at the bleeding edge of scientific frontiers in a wide range of fields — from mapping the human genome to predicting the impact of global warming to designing the next generation of green planes and cars. Innovation like this generates an enormous amount of data, spurring the rapid growth of data centers and driving an increasing demand for analysis tools. Successful supercomputing is no longer just about performance; rather, it’s a combination of performance, efficiency, power and storage.

Today, supercomputer performance is measured in petaflops. Modern computers can operate at more than 1 quadrillion calculations per second. The world’s fastest supercomputer, the Titan supercomputer at Oak Ridge National Laboratory in Tennessee, relies on more than 18,000 AMD Opteron processors and almost 300,000 individual processing cores to deliver nearly 20 petaflops of compute performance. It’s no small feat to create an environment capable of accommodating such power. Both government and industry are working hard to overcome the existing scientific and technological challenges standing in the way of this new and powerful computer architecture.

Last year, AMD was awarded a total of $12.6 million for two research projects associated with the Energy Department’s Extreme-Scale Computing Research and Development Program, known as the “FastForward” program, a two-year research project that is part of the department’s effort to build the next generation of “exascale” supercomputers. The Energy Department awarded $9.6 million to AMD for processor-related research and $3 million for memory-related research.

The Energy Department’s efforts are already producing amazing results, but there are many technological barriers to overcome before the government can make exascale platforms available for further progress. Here are a few of the key challenges:

Computer architecture: Developing new systems with hundreds of millions of cores and allowing for massively parallel computing and memory access require new computer architecture and design. In addition to traditional computing, there are explosive compute-acceleration opportunities leveraging computing across both traditional central processing units (CPU) and parallel processing utilizing graphics processing units (GPU). AMD combines these in an integrated accelerated processing unit (APU) that brings tremendous efficiency and acceleration to computation — particularly emerging workloads.
Power consumption: The Energy Department has set a maximum power limit of 20 megawatts, which requires considerable power efficiency. This means energy-consuming components such as microprocessors, interconnects, and memory must be designed to radically reduce power consumption.
Memory: Exascale computing has massive memory and bandwidth requirements to allow communication with compute nodes that have multicore processors..
Reliability: A massive-scale supercomputer will require that components be highly reliable in order to minimize the impact when components fail.
Software: Simply put, the software must be able to take advantage of the massively parallel, multinodal, multicore processing capabilities of an exascale-class supercomputer.

According to the 2011 report The Future of Computing Performance: Game Over or Next Level?, by the National Academy of Sciences, “Virtually every sector of society — manufacturing, financial services, education, government, the military, entertainment, and so on — has become dependent on continued growth in computing performance to drive industrial productivity, increase efficiency, and enable innovation.”

Exascale computing is primed to meet the challenges of each of these sectors head-on, but success will only be achieved through continued, collaborative partnerships between industry and government.