Quick and Dirty Disaster Recovery Guide - Part I

Nov 29, 2018 | Lindsey Watts

Last year, the city of Atlanta experienced a SamSam ransomware attack that began as a demand for $51K bitcoin, quickly rose to $2.7 million in projected breach and recovery costs, and ended with a staggering price tag of $17 million to restore and rebuild. During the ordeal, numerous city departments temporarily lost access to vital departmental programs and processes, including their revenue generating online bill-pay platform. So what does this have to do with your business if you aren’t running a major metropolitan city? Everything. Those hit hardest by malware and ransomware tend to be either slow-moving bureaucracies or organizations who don’t see themselves as targets. The reason for this is because attackers are actively seeking out those who are relatively unconcerned, banking on the fact that the less they see themselves as targets, the less prepared they will be for a disaster. So while bigger companies make attractive targets, the mid-level to smaller ones tend to be more often affected and less able to recover. Attacks aside, there are also more common–and seemingly innocuous–scenarios facing businesses daily that can cost just as much as an attack, because they result in operational downtime. This would include VIP laptop drive failure, RAID card failure (hardware device or software program used to manage hard disk drives–HDDs–or solid-state drives–SSDs), storms or natural disasters common to certain regions (fires, tornados, ice storms, tropical storms, etc.), software errors with a mail server, or user error that results in deleted data or critical emails. So perhaps it is worth a few minutes’ time to think through what goes into a proper Disaster Recovery (DR) plan, then comparing that with what may be currently in place at your own organization. Have you and your team really taken all steps necessary to keep your business functioning amid unforeseen circumstances? TRUE’s certified engineer and seasoned DR expert, Tim Meuter, Director of the TRUE Network Operations Center, will walk us through a few initial steps and the questions you should be asking: What are the actual costs of operational downtime? What defines a solid Disaster Recovery Plan? What are your most important assets and processes, without which the business could not function? How do you determine acceptable loss for your individual organization? Where and how often should you be backing up valuable data? What types of backup solutions will meet your unique business requirements?

The Cost of Operational Downtime

80% of victims hit with ransomware experience a full 3 days of downtime before they can be up and running again. Of those companies who go down for 3 days, another 80% go out of business within 12 months. Out of business. Why? It’s the ripple effect of lost work, inability to respond to pressing tasks, unfilled orders, tickets left standing, lost opportunity, and the ongoing problem of trying to reconstruct what was done just before access to the environment was denied or systems failed. Your IT Director knows this, which is why she or he sees downtime as the enemy and likely harps on it ad nauseum. (He or she is right, by the way.) To help illuminate downtime from their–and a Board of Directors’–perspective, here is a simple way to calculate what it will cost your business per hour/day for systems to go down:

[(annual revenue) + (*annual operational costs)] x (hours of system failure)

**2080

*You’ll continue to pay these while down.

**Average operational hours per year

For example. If you have annual revenue of $10M, add annual operating costs of $1M, have system failure (operational downtime) for 2 hours, that is:

($10M + $1M) x 2 = $22M; $22M/2080 (annual work hours) = $10,576.92

–for just 2 hours of downtime–

Suffice to say, it’s expensive to close your doors for a few days. So before you begin, this will be a good exercise for perspective.

What Goes Into a Disaster Recovery Plan

Bottom line, a plan will include everything and how long it takes, to get back up and running–a measurement called Restore Time Objective (RTO). If your plan doesn’t answer this question with a specific RTO, you don’t have a plan. Cyber insurance, for example, is certainly part of a plan, but it’s no roadmap and won’t give you RTO. Some actually will term their DR plan a “Business Continuity Plan”, which is exactly what it sounds like–an inclusive, chronological, measurable, andwritten course of action that keeps all your vital processes up and running if Plan A fails. Ultimately, this DR/Business Continuity Plan will include everything from what key assets need protecting, to who is responsible for undertaking certain predetermined activities, to how you will restore lost data, and even your attorney’s role in decision-making or media response following disaster or a breach.

Proper Asset Cataloguing

The first thing your board will want to know in the event of an emergency is how you have ensured protection of valuable data and when you can get servers back online as soon as possible to ensure the continuity of your business. Like any good plan, notes Tim Meuter, this starts with a thorough accounting of existing assets. You don’t know what you want to back up until you really document all of the places your data lives, who interacts with it, and how that data is being used. So as a ground-zero starting place, your team will want to list–and categorize the sensitivity or relative importance of–all:

Buildings and Property
Equipment
IT equipment
Supplies and materials
Records (video footage, financial records, visitor logs, paperwork, etc.)
Intellectual property
Personnel
Communications (internet access, APIs, phone lines, mobile networks, etc.)

This seems simple, but it is vital and needs to be documented, organized, and updated regularly with new equipment, locations, technologies, processes, etc. as your business grows.

Determining Acceptable Loss

As a second step, Tim works to help customers balance their operational needs with the complexity of their environments, as well as their budgets. This is separate from, and in addition to, your cost of downtime. First, he notes, look at your industry and what your people are doing– what is accomplished in a single day, organization-wide, and how much data can you afford to lose? For example, if you lost an entire day’s work, could all of your people conceivably reconstruct everything they did in the last 24 hours? Meuter offers, “I know I couldn’t remember every single thing I did in a day, but maybe some people can.” In organizations dealing with vital operations, however, we find that most people can reconstruct about 2 hours’ worth of work, max. What this will cost you gets down to a choice between what you cannot afford to lose, as well as how much data you store, and the prospect of doing business without it. He goes on to explain, “For an IT company, you may be talking about 20-30 tickets. For an accounting firm, it’s updating client files. When you get into a whole day, who can remember in detail everything they did yesterday? That’s a lot to expect from all employees, and most people don’t want to be in the position of dropping that many balls at once.” So knowing your acceptable loss tells you how often you need to run backups (yes, even in the cloud). Most people select a range between 15 minutes and 2 hours, corresponding to their budgets.

Types of Backup Technologies Available

Once you have established what is most vital to your business and your company’s acceptable loss, you’ll want to walk through protecting your servers with backups that meet those requirements. To begin, it’s a good idea to evaluate technologies and methodologies. The basic concept of saving data elsewhere is what most people consider a “backup”. Just having copies (“clones”) of data is not bad, but your team will still have to reinstall operating systems, reload programs, and reset all security preferences. So if you are relying solely on saving work in a second location, you will experience more downtime while your IT Team is putting operations back together. That is lost time you will not be serving customers and generating revenue. For smaller shops that have higher thresholds and don’t mind the disruption, that may suit their budgets. For most, however, a mirror image backup of the entire system is preferred. Mirror image technology will take a complete snapshot of your environment at incremental points in time (the time increments we just determined above). In these cases (most of our TRUE managed IT customers) an organization’s entire server is completely mirror imaged at that point in time, typically every 15 minutes, giving them the ability to restore the entire environment right away, minimizing downtime and potential loss.

Where to Store Instances of Your Data

With the right frequency, technology, and methodology in place to meet your requirements, you’ll then choose a location for second instances of your environment, and the best way to get up and running in the time period that best meets your acceptable loss threshold. If you have an on-premise network, you might consider putting a secure backup in the cloud. If you your systems are already virtualized and accessed in the cloud, you might consider spinning up another cloud instance–a complete backup with another provider, in the event that your main provider goes down or someone breaks into your account and deletes all your data. As Tim notes, “Yes, this can and does happen, so plan for it by putting a year’s worth of imaged backups in a separate cloud instance. Then, you can store everything you’ve done just in the last 30 days on-premise.” You may be asking yourself, In the age of cloud everything–services, security, platforms, software, why would I still want an appliance in my physical location? In short, it just takes too long to restore immediate operations from the cloud, so this gives you at least the last month’s worth of work that has been done, allowing your people to attend to what is urgent while you restore everything else. Having an appliance on-hand with everything you need to get up and running in a hurry is important, as it will be much faster to spin up in an emergency that damages your systems, but not your physical location–like malware or ransomware. If you choose to do it the other way around, with 1 year’s worth of data kept locally, and 30 days’ worth in the cloud, that’s okay, just less efficient for some businesses.

Building your Disaster Recovery Plan is simply an exercise in business pragmatism, beginning with the initial steps of calculating the cost of downtime, cataloguing assets, determining acceptable loss, and putting reliable backups in place to meet your needs. By no means will that be the end of the road for business continuity, but it’s a starting place. Ultimately, unforeseen circumstances can present your company with an opportunity to prove to existing and future customers your dependability as a reliable organization they can count on, no matter what.

In the next installment, we’ll take a deeper dive into further considerations in building a Disaster Recovery Plan, as expert Gary Noto walks us through vital considerations and avoiding common pitfalls.

If you would like to discuss your Disaster Recovery Plan and the unique needs of your organization, please reach out to us today.