How We Solved our Disaster Recovery Problem Using Standard Tools and a Bit of Creativity

By: Scott Fritsch, Director of Engineering (March 2023)

At Zerowait, we have modified our Disaster Recovery (DR) implementation over the years. There are a lot of options out there. It took us a while to design a solution with the right balance of security, efficiency, ease of deployment and ease of administration. Of critical importance, we needed a DR solution that resulted in little or no data loss between an event and upon resumption of services. Finally, we needed to establish a measurement of what would constitute a successful implementation.

One item that had to be resolved at the onset was defining some terminology. Disaster Recovery is when our primary facility is offline for 2 days or more, or if there is a physical disaster that renders the primary facility inoperable. Business Continuance, on the other hand, refers to a recovery from a shorter interruption in service that is less than our disaster recovery threshold, and does not involve failing operations over to our secondary facility.

Our DR facility is about 1,300 miles away (as the crow flies) from our Production facility and we use site-to-site VPNs between our routers to keep our data safe. Our application servers are a combination of physical and VMware. For hardware, we use a mix of Zerowait SimplStor and NetApp filers. NetApp SnapMirror is used to replicate our virtual systems as well as data.

A big gamechanger was our transition from hosting every aspect of our infrastructure in our primary facility. Hosting everything gave us the ability to strictly control and protect our data, but we were replicating a ton of data to our DR facility, which didn’t have the same bandwidth available as in our primary facility. This caused challenges, both in allowing time for nightly mirrors to complete as well as replicating the data back to our primary facility after an outage. We implemented VMware’s SRM at one point in hopes of solving this, but it proved costly, overly complex, and ultimately did not function as desired. In the end, we evaluated our mission-critical applications and were able to outsource some of them: webhosting, email, and phones. Moving those applications made a huge impact. We were careful to fully vet the security and stability our chosen vendors, and now have resilient, cost-effective services for our distributed work force.

Offloading these services from our internal infrastructure provided us the opportunity to re-evaluate our DR strategy, and ultimately redesign our implementation. We focused on 2 internal systems that were deemed mission critical for day to day operations in the event of a disaster. Daily backups are made of our databases and numerous document files, which are stored on a NetApp Filer. Those files are SnapMirrored to our DR facility. By deploying a strategy of deduplication, compression and altering our snapshot schedule, we also reduced the mirror footprint substantially.

Microsoft RemoteApp and Desktop Connections (RADC) is used as part of our standard infrastructure, which provides our remote users with access to the internal critical databases, as well as other applications. Zerowait Engineering discovered that instead of using expensive tools and a complex failover process, that we could simply load the replicated database files from our main site to a standalone database server at our DR site. Access to the DR site is facilitated by a simple and yet effective method of using an alternate RADC URL to access the RADC Server at the DR facility. Once the connection is made to the URL, the webpage has links to all available applications at the DR facility.

The result is a Disaster Recovery solution and plan that is effective, cost efficient, and simple to maintain. Our DR solution is tested yearly, and we recently completed another successful DR failover exercise.

Reliable solutions, paired with Zerowait’s outstanding service, are why our clients come back to us year after year. Would you like to know more about how we can help you analyze and improve your Disaster Recovery plan? If so, please click the link below to let us know.

This email address is being protected from spambots. You need JavaScript enabled to view it.