Disaster Recovery tests
  • 26 Aug 2022
  • Dark
    Light

Disaster Recovery tests

  • Dark
    Light

Article Summary

An essential part of any Disaster Recovery strategy is performing controlled tests to ensure VMs and services can failover in the manner expected. Expedient is happy to offer assistance in performing these tests for clients who subscribe to one of Expedient's Disaster Recovery as a Service (DRaaS) offerings.

Cost

Clients who subscribe to Push Button DR on Demand will be billed for RAM, Compute, and VMware licensing based on utilization each day any VMs are failed over for testing. Expedient does not charge for labor to conduct the test.

Clients who subscribe to Expedient's traditional Push Button DR offerings will not incur additional expenses for properly scheduled DR tests.

How to request a DR test

Once the DRaaS services are fully implemented, Expedient's Service Delivery team, and specifically your Project Manager, will be the ones performing your first DR test. Service Delivery engineers will request a test plan from you to ensure we agree on what 'success' looks like. This will encompass infrastructure, application, and end-user acceptance criteria. You might choose to postpone performing your first DR test, but please be aware that Expedient reserves the right to exempt any credit requests during a disaster or future test until the first DR test is completed.

After your initial implementation, you may request to schedule an additional DR test with Expedient's Operations Support Center (OSC). Please note that these must be scheduled several weeks in advance. OSC staff will attempt to accommodate your requested date(s), but please be aware that the team can only handle a finite number of tests at any given time, and requests from other clients might already fill time slots.

Types of DR tests

Depending on what test is performed, the impact on your environment will differ. There are generally three types of tests:
 
* Bubble Test: During a Bubble test, a replicated copy of your servers are powered on at the recovery site in an isolated network Bubble. Your production environment will remain up during this testing while you and your team perform validations of the replicated servers in the Bubble. This test allows for verification of the server OS and data integrity, but complete application testing may not be available due to the nature of the Bubble. Servers in the Bubble will neither have inter-VLAN connectivity nor internet connectivity. Servers can only communicate with other servers in the same VLAN. You will only have to access the servers via remote console in vSphere instead of RDP or other means.
 
* Live Failover: During a Live Failover test, your production servers will be powered off while replicated copies are built and powered on at the recovery site. Your environment will be effectively down during the time it takes to failover your network and servers. We usually allocate about an hour for failover. Still, it can take shorter or longer depending on the size of the servers in the replication groups and how quickly the virtual firewalls boot and establish their BGP peering. Once the failover is completed and the servers running at the recovery site are validated, we will commit the failover, which will delete the servers at the production site and begin replicating the servers at the recovery site back to the production site. At this point, your production environment should be up and running at the recovery site. Once reverse replication is completed (up to 48 hours later), we will perform the entire process with you again to fail back to the production site.
 
* Live Failover (No Commit): A Live Failover test with no commit is very similar to a Live Failover. After network and server groups are failed over and validated, we will cancel the failover instead of committing. This will power off and delete the servers at the recovery site and power on the servers at the production site in the same state they were in when the test began. With this type of test, any changes made to the servers at the recovery site will be lost when the test is completed.


Was this article helpful?