What Is an HPC Cluster?
Each server in an HPC cluster is called a node. “The nodes in each cluster work in parallel with each other, boosting processing speed to deliver high-performance computing,” the post notes.
Cameron Chehreh, CTO and vice president of pre-sales engineering at Dell EMC Federal, tells FedTech these nodes may include processing power through CPUs and GPUs on servers; tools such as NVIDIA and Intel software development kits; frameworks including TensorFlow, MXNet and Caffe; and essential platforms with Kubernetes and Pivotal Cloud Foundry.
As an Iowa State University guide notes, there may be different types of nodes for different types of tasks. These can include a headnode or login node, where users log in to HPC systems; specialized data transfer nodes; regular compute nodes; so-called “fat” nodes that have at least a terabyte of memory; graphics processing unit nodes; and more.
“All cluster nodes have the same components as a laptop or desktop: CPU cores, memory and disk space,” the guide states. “The difference between personal computer and a cluster node is in quantity, quality and power of the components.”
HPC Applications in Government
In addition to enabling critical research such a COVID-19 treatments, HPCs in government support a wide range of cutting-edge research that could not be accomplished with regular computing power.
The Energy Department’s National Renewable Energy Laboratory, for example, runs its High Performance Computing User Facility for scientists and engineers “working on solving complex computational and data analysis problems related to energy efficiency and renewable energy technologies,” the NREL says.
“The work performed on NREL’s HPC systems leads to increased efficiency and reduced costs for these technologies, including wind and solar energy, energy storage, and the large-scale integration of renewables into the electric grid,” the lab notes.
HPCs also enable research partnerships between the government and private sector into other kinds of energy innovation and advanced manufacturing techniques.
In November, the LLNL announced a partnership with Oak Ridge National Laboratory and Rolls-Royce to use HPC to “study a key modeling component in the quench heat-treatment processes for gas turbine parts.” LLNL also announced a partnership with Toyota Motor Engineering & Manufacturing North America to “improve understanding of the relationship between properties in specific solid electrolytes for lithium-ion batteries.”
The LLNL also announced in November the rollout of a new HPC cluster, dubbed Ruby, which is powered by an Intel Xeon Platinum-based cluster. Ruby is being used for unclassified programmatic work in support of the National Nuclear Security Administration’s mission of maintaining the country’s nuclear weapons stockpile. Ruby is also being used, according to the LLNL, for research into “asteroid detection, moon formation, high-fidelity fission and other basic science discovery.”
HPC Storage for Government Agencies
As NetApp notes on its site, HPC clusters are networked to the HPC system’s data storage to capture the output. Storage is a critical element to an HPC architecture.
“To operate at maximum performance, each component must keep pace with the others,” NetApp notes. “For example, the storage component must be able to feed and ingest data to and from the compute servers as quickly as it is processed.”
Similarly, HPC networking components “must be able to support the high-speed transportation of data between compute servers and the data storage.”
If one component, including storage, cannot keep up with the rest, “the performance of the entire HPC infrastructure suffers,” NetApp notes.