“Luck is what happens when preparation meets opportunity.” – Seneca
As I covered in another blog post, the first step to any effective business continuity and disaster recovery program is crafting a thoughtful, achievable plan.
But having a great business continuity and disaster recovery plan on paper doesn’t mean that the work is done. After all, how do you evaluate the efficacy of your plan or make adjustments before you actually need it? The answer: by putting it to the test.
I am fond of saying that managed services are a three-legged stool made up of technology, people and processes. If you lose any one leg, the stool falls over. And since an IT department is essentially offering managed services to the wider organization, IT management should think in terms of the same triad.
Let’s break it down:
For a disaster recovery scenario, you need to test the stool to make sure that each leg is ready and that the people know what to do when the time comes. One useful tool for this is a tabletop exercise (TTX). The purpose of the TTX is to simply get people thinking about what technology they touch and what processes are already in place to support their tasks.
Let’s walk through the stages of a typical TTX.
Write a quick narrative for the disaster. Start off assuming all your staff are available, and then work through threats that you may have already identified. Some examples:
Now, some questions and prompts for your staff:
Going through the exercise, you’ll likely find that certain recovery processes are not properly documented or even completely missing. For example, your network administrator might not have a written recovery process. Have them and any other relevant staff produce and formalize the process, ready to be shared at the next TTX.
Continue this way for all the role-players until your team can successfully work through the scenario. You will want to thoroughly test people’s roles, whether in networking, operating systems, applications, end user access or any other area.
Unfortunately, we have all seen emergency situations and scenarios, such as the 9/11 terrorist attacks, where key personnel are either missing, incapacitated or even deceased. In less unhappy scenarios, some staff might not be able to tend to work since their home or family was affected by the disaster. For the purposes of a TTX, you can simply designate someone as being on vacation and unreachable, then have them sit out.
Just like a submarine commander might call a crash dive drill at the most inopportune time, call a TTX drill on your own team to test the plan. For this, someone might actually be on vacation. Use that to your advantage to make sure that the whole team knows how to step in and how to communicate throughout the drill. You might even plan the drill to coincide with a key player’s vacation for added realism.
Once you’ve executed your tabletop exercise, now it’s time to do a real test! Have your team actually work through all of the steps of the process to fail over to the recovery site.
Again, you will want to test that the servers and application can all be turned up at the recovery environment. To prevent data islands, make certain that users can successfully access your applications’ recovery site from where they would operate during a disaster. Here are some questions for user access testing:
The help that an IT service provider provides you doesn’t have to stop with managing your Disaster Recovery as a Service infrastructure or environment. With every INAP DRaaS solution, you get white glove onboarding and periodic testing to make sure that your plans are as robust as you need them to be. Between scheduled tests, you can also test your failover at will, taking your staff beyond tabletop exercises to evaluate their ability to recover the environment on their own. Staying prepared to handle disaster is a continuous process, and we can be there every step of the way to guide you through it.