- 16 Feb 2021
- DarkLight
Host-to-Host Replication Failover Operations
- Updated on 16 Feb 2021
- DarkLight
The Move Operation
The Move Operation is utilized to migrate protected virtual machines from the source (protected site) to the target (recovery site) as a planned migration. When performing a planned migration, it is assumed that both the source and target sites are healthy, and that you planned to relocate the virtual machines in an orderly fashion.
The Move Operation has the following steps:
1. Gracefully shutting down the protected virtual machines. This ensures data integrity
- It is recommended to have VMware tools installed, so that our software can do the graceful shutdown. If VMware tools are unavailable, the virtual machines can be shutdown manually prior to performing The Move Operation.
2. Create a final clean checkpoint. This ensures no data loss, since the virtual machines are stopped, and no new I/O is being written.
3. Replicating any pending and new clean checkpoint to the recovery site.
4. Create virtual machines at the recovery site and attaching each virtual machine to its relevant virtual disks, based on the checkpoint created in step 2.
5. Setting VMware HA to prevent automated DRS operations during The Move Operation
6. Powering on the virtual machines at the recovery site, if applicable, using the boot order defined in the virtual protection group
7. Committing The Move Operation
- The default is to automatically commit The Move Operation without additional testing. However, this can be altered with the commit policy upon Move Operation execution.
8. Remove the protected site virtual machines from inventory
9. If reverse replication is specified in the virtual protection group settings, the VMDK files at the protected site are utilized for reverse replication and the delta sync process will start. The delta sync process ensures the data integrity across the disks at each site.
- If reverse replication is not specified, the virtual protection group settings are saved, however the protected sites disks are deleted, and a full replication seeding process would need to be utilized if reverse replication needs to be setup in the future
The Failover Operation
The Failover Operation is utilized following a disaster to recover protected virtual machines to the recovery site. A Failover assumes that connectivity between the protected and recovery sites may be down, and thus the protected virtual machines and disks are not removed. When performing a Failover Operation, you always specify a checkpoint to which to recover the virtual machines. You can choose the latest checkpoint, an earlier generated checkpoint or a user-defined checkpoint. Our software will ensure the virtual machines are recovered to the specified point in time. Multiple consecutive Test Failover Operations could be utilized to determine the desired checkpoint.
The Failover Operation has the following steps:
1. Create virtual machines at the recovery site and attaching each virtual machine to its relevant virtual disks, based on the checkpoint selected for recovery point in time.
- Note: the original (protected site) virtual machines are not touched, since the assumption is that the original protected site is down
2. Setting VMware HA to prevent automated DRS operations during The Failover Operation
3. Powering on the virtual machines at the recovery site, if applicable, using the boot order defined in the virtual protection group
- The default is to automatically commit The Failover Operation without additional testing. However, this can be altered with the commit policy upon Failover Operation execution.
4. In the event of a partial Failover, if reverse replication is specified in the virtual protection group settings, the VMDK files at the protected site are utilized for reverse replication and the delta sync process will start. The delta sync process ensures the data integrity across the disks at each site.
- If reverse replication is not specified, the virtual protection group settings are saved, however the protected sites disks are deleted, and a full replication seeding process would need to be utilized if reverse replication needs to be setup in the future
The Failover Test Operation
The Failover Test Operation is utilized to test / validate the recovery of virtual machines at the recovery site. The Failover Operation creates test virtual machines in a sandbox environment, using a test bubble network, as defined in the virtual protection group settings. All testing is written to a scratch volume, which has a configurable value based on testing duration requirements. These test volumes are managed by the virtual replication adapters (VRAs).
Note: During the test, any changes to the protected virtual machines are sent to the recovery site and new checkpoints continue to be generated, since replication of the protected virtual machines continues throughout the test.
The Failover Test Operation has the following steps:
1. Start the test.
- Choose a checkpoint to use for the test. This can be an existing checkpoint or a new one you create for this specific test.
- Create the test virtual machines at the recovery site, using the bubble network specified in the virtual protection group settings.
- Powering on the virtual machines at the recovery site, if applicable, using the boot order defined in the virtual protection group
2. Stop the test.
- Power off the test virtual machines and remove them from inventory
- Add a tag to the checkpoint specified noting “Tested at startDataAndTimeOfTest”