5 Disaster recovery tips
Essentially, the key to Disaster Recovery success is having a realistic and well understood set of objectives that are based on the business needs. This involves planning and preparation, from the business impact analysis, to understanding and quantifying risks, to classifying and prioritising applications and data for recoverability.
Additionally, there is the need for preparing systems to be able to recover, and then documenting everything, especially the Disaster Recovery plan. Another factor for success is to make Disaster Recovery less than an exception by integrating Disaster Recovery hardware components into production. The dynamic nature of IT requires continuous review and updates of the process and the plan. It must be part of the day-to-day operations.
Finally, investing in a solid technology basis is critical. An organisation must leverage newer technologies that provide higher performance at lower cost where possible, and at a minimum it must ensure that backups are functioning well.
1. Business and IT need to be linked
Creating a Disaster Recovery plan is a compromise and while people are aware of best practices, they face issues related to cost. When best practices are pitted against cost, cost needs to be the second and not first priority. Even more important, though, is that capabilities needs to be matched to expectations. Responding to a disaster is an exception, but preparing for it should not be a burden but integrated with day-to-day priorities.
2. There needs to be a Disaster Recovery plan
The Disaster Recovery plan needs to represent all functional areas within IT prior to, during, and after a disaster. It needs to include applications, networks, servers & storage.
Contingencies, such as “what-if” scenarios should be considered as part of planning process. Decisions need to be made regarding levels of disruption that will constitute a disaster, downtime and loss tolerances.
3. Keep the Disaster Recovery Plan current
Disaster Recovery planning needs to be part of the day-to-day operations of the IT environment and even though it is an exception, it should always be at the forefront of people’s minds. Once the Disaster Recovery plan is created, it needs to be maintained and updated every time an element within the IT environment changes. The dynamic nature of IT environment ensures that the Disaster Recovery plan will fail if the management of the plan is not part of change management.
4. Test the Disaster Recovery Plan
The Disaster Recovery plan needs to be tested regularly to ensure the business can recover the operation successfully and in a timely fashion. Disaster Recovery testing is a major challenge for most IT departments, but if recovery has not been tested all the way to the application level, it is very likely that problems will occur.
Even though a Disaster Recovery test is a major operational disruption it shouldn’t be treated as a pro forma exercise but needs to include true end-to-end testing all the way to production. The focus needs to be on recovering applications rather than servers since with today’s complex applications, client server and web-based multi-tier applications, the components reside on multiple servers thus there are interdependencies between these. If disaster recovery has not been tested all the way to the application level, it is very likely that problems will occur.
The philosophy for Disaster Recovery testing needs to change. Basically the approach used for software quality testing should be adopted, where finding bugs is a positive thing. Finding problems in Disaster Recovery is equally positive as long as these issues are resolved to eliminate problems during a real disaster.
5. Set realistic recovery objectives
Frequently, organisations have established objectives and prioritised servers and applications in accordance with Disaster Recovery policies. However, upon an objective examination of Disaster Recovery capabilities and resources, it turns out that these goals are not attainable. Thus it is important to set realistic Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO).
In regards to the RPO when does the clock start ticking and what tolerance is permissible for an outage. As for the RTO how current is the data prior to the disaster. These are the key matrix items that need to be determined and supported. It is important to examine whether the infrastructure can support the goals.
Via ZD Net tip from Ritchie at disaster recovery















