- 29 Jul 2021
Cloud Object Storage Data Resiliency
- Updated on 29 Jul 2021
As a scale out platform, Cloud Object Storage cannot utilize the familiar RAID system to provide high availability and fault tolerance across multiple hard disks. This is largely due to the fact that RAID configurations cannot span multiple nodes, of which many comprise the Cloud Object Storage platform. This problem is compounded in that data must also be resilient across multiple data centers to ensure constant uptime if a single site were to go offline. RAID systems also have a few other disadvantages such as slow rebuild times, requiring hot spares, and the inability to validate the data. All of these reasons make RAID less attractive in ever growing storage arrays.
Erasure Coding employs the same general principles as RAID systems, however the management of these actions is moved into software which enables this functionality to be scaled across multiple nodes in a single cluster. Most importantly this allows for a system to scale by simply adding nodes into the cluster configuration, meaning there are effectively no limitations to the size of a storage system. This brings along higher resiliency as well as allowing for entire nodes to go offline while still serving data.
On a technical level, Erasure Coding works by splitting each object into Data(D) stripes and into Code(C) stripes and placing these chunks across various nodes in the cluster. Expedient's Cloud Object Storage utilizes EC7:5, meaning that a single object is broken down into 7 individual Data stripes and 5 Code stripes. As long as any 7 of the 12 total stripes remain available for the system to read, the data remains consistent. In a scenario of 12 total nodes and a stripe on each node, only 7 total nodes would need to remain online to access the data.
In addition to the above, data rebuilds can begin as soon as a drive is marked offline without requiring a hot spare as in traditional RAID systems, free space on all of the other drives in the system is utilized. Furthermore rebuild times are considerably reduced as the work to rebuild the data is spread across the power of the entire cluster and the entirety of one drive does not need to be rebuilt to another singular drive. Both of these improvements mean that your data is more available and in a degraded state for less time when failure does occur.
Distributed Erasure Coding
Erasure Coding is a great improvement over RAID systems, however in our above example we are still stuck just with all 12 of our nodes in the same physical location. If in addition to individual disks and individual servers, we treat an entire geographic location as a failure domain we can further increase the resiliency of our system. This functionality is employed by Cloud Object Storage to ensure the highest durability for your data. .
This means that in an EC7:5 configuration, each site now holds 4 of the total 12 stripes. In the same 12 node example and utilizing 3 geodiverse locations as is deployed for Cloud Object Storage, there are now 4 nodes at each of the 3 locations. Any number of disks, nodes, or even sites can go offline and data still be accessible as long as 7 of the total 12 fragments remain online. This allows for a much more resilient system ensuring your data is always available.
While there are great benefits from employing Distributed Erasure Coding, there is one downside. Stretching the nodes across multiple sites does introduce some latency into the process of writing and reading data, however Expedient has taken great care to ensure the impact is minimal on the Cloud Object Storage platform. The slight performance decrease is also significantly outweighed by the benefit of higher availability, resiliency, and durability.
First and foremost, all 3 geodiverse locations have redundant 100Gbps fiber links interconnecting them. This ensures that latency is low between the sites and there is ample bandwidth for all IO operations.
Expedient also leverages platform functionality that allows for the tuning of the data consistency. The configuration implemented on Cloud Object Storage is known as Quorum Consistency. This means that for an IO operation to succeed it must only need any of the 7 of the 12 fragments completed for the IO operation. This significantly increases speed as now IO operations will complete with the majority of nodes participating rather than requiring IO operations to wait on all of them. Any additional required IO operations will occur in the background to ensure full durability. This speeds up access to your data while also providing higher availability across multiple geodiverse locations.