Jun 04 2021

High-Performance Computing Clusters and Applications in Government

What are the basic elements of an HPC architecture for federal agencies?

The falling cost of basic computing components and advances in secure cloud computing are making high-performance computing more accessible and useful for federal agencies.

Over the past year, HPC solutions have helped researchers accelerate basic science, therapeutics development and patient care for COVID-19, according to Jim Brase, Lawrence Livermore National Laboratory’s deputy director for computing.

In March of last year, the COVID-19 HPC Consortium was stood up as a public-private partnership to give COVID-19 researchers free access to the Energy Department and industry supercomputers.

According to an LLNL blog post, 100 projects so far have been approved by the consortium, “including work by academia and national laboratories to simulate the SARS-CoV-2 protein structure, model the virus’ mutations and virtually screen molecules for COVID-19 drug design.”

Other research accelerated by high-performance computing has helped identify promising drug compounds. “This is evidence that this kind of approach to responding to an emergency situation like this can have real impact, and it’s on multiple timescales,” Brase says in the post.

What are the basics of high-performance computing, and what do agencies that want to use HPC resources need to know? Here’s a breakdown of HPC clusters, applications and storage.

What Is an HPC Cluster?

According to a post from NetApp, an HPC cluster “consists of hundreds or thousands of compute servers that are networked together.”

Each server in an HPC cluster is called a node. “The nodes in each cluster work in parallel with each other, boosting processing speed to deliver high-performance computing,” the post notes.

Cameron Chehreh, CTO and vice president of pre-sales engineering at Dell EMC Federal, tells FedTech these nodes may include processing power through CPUs and GPUs on servers; tools such as NVIDIA and Intel software development kits; frameworks including TensorFlowMXNet and Caffe; and essential platforms with Kubernetes and Pivotal Cloud Foundry.

As an Iowa State University guide notes, there may be different types of nodes for different types of tasks. These can include a headnode or login node, where users log in to HPC systems; specialized data transfer nodes; regular compute nodes; so-called “fat” nodes that have at least a terabyte of memory; graphics processing unit nodes; and more.

“All cluster nodes have the same components as a laptop or desktop: CPU cores, memory and disk space,” the guide states. “The difference between personal computer and a cluster node is in quantity, quality and power of the components.”

RELATED: How does high-performance computing power medical research?

HPC Applications in Government

In addition to enabling critical research such a COVID-19 treatments, HPCs in government support a wide range of cutting-edge research that could not be accomplished with regular computing power.

The Energy Department’s National Renewable Energy Laboratory, for example, runs its High Performance Computing User Facility for scientists and engineers “working on solving complex computational and data analysis problems related to energy efficiency and renewable energy technologies,” the NREL says.

“The work performed on NREL’s HPC systems leads to increased efficiency and reduced costs for these technologies, including wind and solar energy, energy storage, and the large-scale integration of renewables into the electric grid,” the lab notes.

HPCs also enable research partnerships between the government and private sector into other kinds of energy innovation and advanced manufacturing techniques.

In November, the LLNL announced a partnership with Oak Ridge National Laboratory and Rolls-Royce to use HPC to “study a key modeling component in the quench heat-treatment processes for gas turbine parts.” LLNL also announced a partnership with Toyota Motor Engineering & Manufacturing North America to “improve understanding of the relationship between properties in specific solid electrolytes for lithium-ion batteries.”

The LLNL also announced in November the rollout of a new HPC cluster, dubbed Ruby, which is powered by an Intel Xeon Platinum-based cluster. Ruby is being used for unclassified programmatic work in support of the National Nuclear Security Administration’s mission of maintaining the country’s nuclear weapons stockpile. Ruby is also being used, according to the LLNL, for research into “asteroid detection, moon formation, high-fidelity fission and other basic science discovery.”

DIVE DEEPER: How are agencies making use of edge computing in the field?

HPC Storage for Government Agencies

As NetApp notes on its site, HPC clusters are networked to the HPC system’s data storage to capture the output. Storage is a critical element to an HPC architecture.

“To operate at maximum performance, each component must keep pace with the others,” NetApp notes. “For example, the storage component must be able to feed and ingest data to and from the compute servers as quickly as it is processed.”

Similarly, HPC networking components “must be able to support the high-speed transportation of data between compute servers and the data storage.”

If one component, including storage, cannot keep up with the rest, “the performance of the entire HPC infrastructure suffers,” NetApp notes.

Quardia/Getty Images