Weathering the storm. An interesting play on words for many of us who have survived a major storm or natural disaster. This could mean anything from a major flood (think Texas after Harvey) to a tornado, a tidal wave, or even an earthquake. Add to this mix current events such as wildfires and sea level rise under king tide and you have a lot of things to watch out for in terms of keeping your business IT up and running. Let us not forget a host of other small disasters, such as water main breaks and power outages, not covered under “Natural Disasters” but equally capable of disrupting operations.
So, what should the manager of an already stretched-thin IT department do? The answer is simple.
“When we fail to plan, we plan to fail”. This is an old saying I hear often from my colleagues here at TRUE. In my opinion “truer words” were never spoken (pardon the pun). The key to weathering any unforeseen event is having that plan in place and knowing what to do when reaction time is tight. It enables you to be light on your feet and not have to think about basic items that can be figured out ahead of time. When you are under duress, you don’t want to be stuck trying to figure out basics– like what you need to do to stay up and running. In the passing hours and days it takes to figure that out and act, you will have lost business opportunity, slowed operations, and possibly even made poor decisions or overlooked a step, since you were under the gun.
Have you ever seen a Police Officer or Firefighter acting totally calm in a horrific situation? Maybe on the news with a situation that would terrify and rattle most of us, like a fire or person choking, and screaming victims? I have seen that in person. The “calm” in the responder’s demeanor comes from knowing what to do. Of course, experience dealing with these types of situations day-in and day-out also makes a significant difference in their confidence, because they have faced that challenge, responded strategically, and resolved the situation before. Just like a trauma surgeon or emergency room doctor. You and I would not have those reference points for how things should unfold, nor the specialized knowledge that can only come from having faced these types of situations in daily life, so the best thing we can do in advance is to know that disasters will happen, have emergency plans, and remember when disaster strikes to try and stay calm. Let the plan you establish guide you.
Basics and Requirements
The information articulated in the plan should include everything from the basics, all the way up to highly critical items and should be specific to your situation or needs. Items such as communications protocols and location of documents or instructions on how to log in remotely are examples. Sometimes knowing where to look for information is the most important factor in an organized response. Additionally, when you think through all possible scenarios you may be facing, you will realize that having a list of phone numbers that are up-to-date and accessible off-site (to enable text messaging with your teams in case normal communication channels are down) can be a life saver. Basically, anything the business or IT will need in the course of a response should be covered in this plan, including outside disaster recovery support to help you fill gaps wherever needed.
Your plan will also need requirements, so you and your support team will know when your plan is complete. These usually come from the business side of the house. Whoever helps you put together and reviews your plan should be able to guide you through what requirements to include, specific to your organization. Once you have met all the requirements of the business, your support team will usually consider the plan complete. These requirements are thresholds, essentially, a tolerance level of how long your organization can remain financially viable without access to certain essential resources. Some examples of this may be the time it takes to get back online after an outage, or an amount of data you can afford to lose as it relates to down time. After analyzing your revenue-generating and operational assets, and the systems that maintain them, you will be able to calculate a particular number. For example, “If we recover to where we were 15 minutes prior to the outage is that adequate?” Sometimes it is not acceptable to lose even one transaction that may only be partially completed in the system, which is why it’s important to work with someone who can look at the big picture and help you think through all angles and considerations related to down time or loss.
Disaster Recovery Plan
All the above factors and more need to be considered in the creation of an effective Disaster Recovery Plan that will actually carry you through when the time comes. Next, we must consider factors such as how and when we declare a disaster, and when the plan takes effect. Who has the authority to declare a disaster and begin to enact the plan? These are more operational in nature, but will still be vital considerations for any plan that is to ensure you survive a significant disruption with your digital assets intact.
Once we have a mature plan, we should be good to go, right? That depends on your business model and current IT environment. Sometimes, the plan cannot become complete without changes to your network and how you operate. Delivering highly available and quickly recoverable applications and services to your network users can be tricky. You may need hardware upgrades or better storage to meet your DR plan goals. What then? Could you consider migrating to the cloud now, or speeding up an existing migration, to get more prepared for the next hurricane? Sure, but there is usually a tradeoff to be mindful of, such as monthly reoccurring costs versus a one-time capitol expense. That’s when you will want to weigh out the amount you save by not having to replace or maintain hardware, versus an ongoing monthly cost. Those numbers will look different for every organization, and you will want to examine them in light of your long-term business goals, as well.
Any changes to the network infrastructure or design would need to take place in concert with the plan. These go hand in hand, as the network is usually just a way to deliver the applications and data necessary for the business to run. So, if you must make adjustments to your network, such as new equipment from time to time, those changes may need to be part of the plan being updated and maybe even tested in concert to ensure the plan provides the result we are after.
The plan should not be looked at like something you do once and then it is complete.
A DR plan is a living breathing thing that not only gets tested from time to time (with game day simulations), but may need to morph on its own to adapt to new networking features or changing business needs. When you upgrade or change any part of your environment, from hardware, to applications, to humans interacting with them, your plan should also change to align. Like a fire extinguisher, the plan needs ongoing maintenance. The last thing you want during a fire is an expired extinguisher that was only giving you false hope all those years it hung on the wall, and now won’t help at all when you actually need it. The plan needs to be updated and maintained from time to time and tested at least annually. That way, it is there and reliable when needed, and you will be able to remain calm, resting in the confidence that it’s up-to-date and has been tested. You will need that confidence when you reach for the plan, knowing it is going to be there for you, so you can be an effective crisis manager for your organization when they are all depending on you.
In closing, we all need to consider “what ifs” and plan.
Even if your plan is loosely documented and immature at this point, it is better than nothing. Having a plan can be the most significant factor to a successful result in any unforeseen situation. Hopefully, this blog has helped illustrate that fact to the point that you are ready to examine your own disaster recovery plan. I have had the opportunity to see it play out both ways many times in my career. Trust me, you want to be the one who is ready when the time comes. Having a plan and maintaining that plan is key. Keeping your network and plan in sync requires much discipline and will result in ensuring the best possible outcome in any unforeseen situation. Your recovery depends on the plan.
TRUE can also help with our industry leading level of experience. Our Engineering and Risk Advisory teams are trained and have the experience to help with any aspect of DR planning and execution. We can assist you with all the projects associated with building a resilient network, from migrations and infrastructure upgrades all the way through business process and risk assessments. TRUE is your comprehensive partner for DR Planning and assistance with any event– natural or un-natural.