Jul 05 2024
Data Center

What Is Cluster Computing, and How Can It Help Federal IT?

Cluster computing technology for data storage can help federal agencies gain scalability, availability, efficiencies and security.

As data volumes continue to increase, federal IT leaders are considering cluster computing as a way to avoid spending money on maintaining storage infrastructure or public cloud.

Proponents of the emerging alternative argue the technology improves scalability and availability for agencies while reducing environmental and monetary waste from unused storage space.

Agencies have used storage area networks for years to consolidate storage solutions into a single fabric, but cluster computing enhances data security by creating more redundant copies of data spread across multiple locations. The tech allows different servers, or nodes, to be tightly coupled and work in parallel in clusters — sharing resources to deliver higher performance and availability.

“And then there is clustered storage, which leverages regular clustering approaches for availability, resiliency, performance, any combination of those,” says Greg Schulz, founder and senior analyst at IT consulting firm StorageIO.

Click the banner below to begin developing a comprehensive cyber resilience strategy.


A Look at Cluster Computing Architectures

A general cluster computing architecture couples and groups different nodes to share resources for performance and availability when running applications, and the nodes may have their own internal dedicated storage. The nodes can see and access each other’s storage as well as their own, Schulz says. 

“So now, all of a sudden, they can start to share storage; they can directly access the storage depending on the software,” he says. “They might even be able to share at the file level, not just shared storage but file sharing between the nodes.”

A key element in this process is Ceph, an open-source software-defined storage solution for object, block and file storage, says Neha Ojha, senior development manager for the Ceph team at IBM and member of the Ceph Executive Council.

Ceph allows organizations to use commodity hardware for particular use cases, and the software-defined nature of the solution allows for massive scale, always-on availability, simplified management and enhanced security.

Dan van der Ster
Now Ceph is very relevant in today’s storage use cases.”

Dan van der Ster Member, Ceph Executive Council

How Does a Cluster Computing Architecture Work?

One way to think about cluster computing is as a village in which resources are shared.

“The village’s strength is the sum of all of the parts of the village,” Schulz says. “The village exists to benefit the entirety. If something falls down, something breaks, somebody gets sick, others step in to help out.” 

If more computing power or storage is needed by an application, the cluster architecture draws on those resources from other nodes. Instead of having unused dedicated storage for each application, a cluster architecture allows agencies to buy additional storage as needed and distribute data across cluster nodes.

Using the village analogy, the issue is determining “who coordinates the village, who keeps everything running and intact so that nobody’s overusing or nobody’s abusing, that everybody is contributing, everybody is benefiting,” Schulz says.

EXPLORE: Defense agencies are turning to multicloud.

That’s where software such as Ceph can make a significant difference and serve that coordinating function. The underlying protocol supports data redundancy and security for all use cases.

“Any user can just come up with a use case and any kind of application that they want to store,” Ojha says. “As long as they have access to the clustered hardware that we talked about, they should be able to seamlessly use all of the capabilities that RADOS provides.”

RADOS, or Reliable Autonomic Distributed Object Store, is a foundational feature of Ceph.

“By manipulating all storage as objects within RADOS, Ceph is able to easily distribute data throughout a cluster, even for block and file storage types,” notes the Ceph Foundation.

LEARN MORE: High-performance computing solutions work for agencies of all sizes.

Typically, if an agency runs out of computer performance or storage capacity, “you need to buy a new thing that’s bigger and faster and then copy everything there,” says Dan van der Ster, Ceph Executive Council member and CTO at CLYSO. That approach amounts to scaling vertically, metaphorically building a taller building.

With a cluster approach, the scaling of storage capacity happens horizontally.

“We just build more houses; we scale that direction,” van der Ster says. “We buy more service, we add them, things rebalance, and you can take advantage of that. You can grow incrementally.”

How Does Cluster Computing Ensure Data Security?

Cluster computing supports agencies’ data security through data replication, whereby multiple copies of the data are written and distributed across various nodes. If one or more copies are lost or erased in a cyberattack, the ability of the agency to access and read the data isn’t lost.

MORE FROM FEDTECH: The National Cancer Institute is serious about data storage and backup.

Once an agency stores data using Ceph, it is guaranteed there will be enough copies to recover from any kind of failure, Ojha says.

Ceph supports both synchronous or asynchronous replication between sites. Availability zones are “isolated or separated data centers located within specific regions in which public cloud services originate and operate,” according to TechTarget.

Ceph is known for its “self-healing” capabilities, which allow it to “quickly react to hardware, power or connectivity failures. Ceph will actively redistribute data around your cluster as soon as an issue arises, protecting against data loss before you even notice there's a problem,” according to the Ceph Foundation.

How Can Agencies Use Cluster Computing for Storage?

Many agencies already use cluster computing for storage for greater performance, scalability and availability, Schulz says. It also allows agencies to avoid vendor lock-in for proprietary hardware.

Ceph lets agencies manage commodity hardware and gain flexibility but also may require dedicated staff for management. For agencies that have missions to fulfill, that could mean “you may have found yourself in the storage provider business without really intending to get there,” Schulz says.

UP NEXT: The smartest computers are some of the most efficient.

A benefit of Ceph is that IT leaders at agencies and other organizations can look at the source code and decide if it is the right solution for them before committing to buying petabytes of storage.

Ceph’s open-source nature gives agencies the ability to innovate and develop new use cases supported by the cluster architecture, Ojha says. That also makes it easy for organizations to build on the community that supports Ceph and get started quickly, without having to commit to anything upfront, van der Ster says.

The solution continues to evolve. When Ceph was first developed 15 years ago, it became the de facto storage back end for many private clouds, van der Ster says.

“Now Ceph is very relevant in today’s storage use cases,” he says. “It came from a high-performance computing environment, actually, but an HPC is very similar to the storage demands in AI now. Because of its flexible nature, it adapts to use cases over time.”

Dragos Condrea/Getty Images

Learn from Your Peers

What can you glean about security from other IT pros? Check out new CDW research and insight from our experts.