Creating a Disaster Recovery Plan for Your WordPress Site

Do you have a disaster recovery plan for your WordPress site?

If you answered no then you’ll probably in good company because not many sites do. Yet how well you’ve planned for disaster will determine how well and how quickly you recover from it.

Putting together a Disaster Recovery Plan is quick and relatively easy. And if disaster strikes it’ll save you so much time and angst that you’ll wonder why you ever thought you could live without one.

What is a Disaster Recovery Plan?

A disaster recovery plan describes not only how to recover from a range of disasters but also steps to mitigate the risk of disaster.

The best Plans balance detail with simplicity. After all, they are going to be used in fairly stressful situations, so they need to clear and to the point.

A basic Plan will include:

  • an investigation checklist (do we even have a disaster?)
  • a list of disaster scenarios
  • a list of recovery actions for each scenario, along with who is responsible for carrying out the action
  • actions and steps to take to mitigate the risk of disaster
  • actions and steps to take to minimize recovery time
  • a list of contact details for all those responsible for actions in a disaster recovery

Why Have A Disaster Recover Plan?

If disaster does strike, be it your site gets hacked, there’s a hardware failure or, even worse, your hosting provider goes offline, then your stress levels are going to rapidly rise. If this is a client site, then you have the added joy of managing their stress levels too.

Having a Disaster Recovery Plan means that all the decisions have already been made in a calm non-disaster environment and so now, when disaster has struck, you can just follow your plan. No panic, no what-ifs, just step-by-cool-calm-and-collected-step.

What Goes In A Disaster Recovery Plan?

Investigation Checklist

The most important step of a Disaster Recovery Plan is to determine if you are actually experiencing a disaster.

Having an Investigation Checklist allows you quickly and easily determine whether you actually have a disaster and, if so, what type of disaster it is.

Generally, you’ll be reaching for Checklist when either you, your client or your uptime tracking service notices that the site is not responding so that you can confirm that there is a genuine problem and if you can fix it.

Two things to remember:

  1. The Checklist is just to determine whether you have a persistent issue that requires mitigation
  2. There are a great many components involved in making a request for a page and that page being delivered to a browser, many of which you have no control over

Once you have completed your checklist, you’ll know what the scenario is and what course of action to take.

Scenarios

The Scenarios are the various types of “disaster” that may occur. Once you’ve determined the scenario you can then work out the severity, the actions required to “fix” the situation and who is responsible for those actions.

Common Scenarios include:

  • Hosting provider goes down – this could be for a variety of reasons, but if downtime for your site is likely to be days rather than hours then you might need to get your site up and running at a new location
  • Your site is hacked – one of the more common disasters and one that often requires a full restore to rectify
  • An update to core / plugins / theme “breaks” the site – happens all too often especially when an update is hastily installed, this may require the update to be “backed out” via restoring files to pre-update status.

Contact List

It’s amazing how difficult it can be to put your hands on the appropriate contact details when you want them.

The Contact List is the go-to place for the name, email address and mobile number for anyone who is required to take action, or who needs to be informed.

Is this what your current plan looks like?
Is this what your current plan looks like?

What To Consider When Creating Your Plan

Your response to each scenario will depend on a number of factors that will be different for each site even within a portfolio.

Acceptable Downtime Limits

This is the most important factor to determine. Most site owners will probably answer that there isn’t any “acceptable” downtime so you’ll need to be brutally realistic but what you are trying to find out is how long can a site be down before you need to look at alternative hosting arrangements, if only temporary.

Clearly, an ecommerce site will have a different time limit to an informational site, so you need to determine how critical the site is to the organisation.

One thing to bear in mind when you are working out the acceptable limits is that moving a site to an alternative provider requires a DNS update and these can take up to 24 hours to propagate. So, even if you can rebuild a site in a new environment in an hour, it doesn’t necessarily mean it’s going to be immediately available.

A crucial part of a Disaster Recovery Plan is managing expectation, so make sure any expectations are realistic.

Checkpoint Frequency

If your response to any scenarios requires a restore, then you need to think about how you set up your backup regime and, in particular, the frequency of your checkpoints.

A checkpoint is the moment that a snapshot of your site is taken. Restoring to a checkpoint means recreating the site as it was at the moment the snapshot was taken and losing any data that was created between the checkpoint and the moment of failure.

If, for example, your backup regime is daily at 8:00am and you are restoring a site at 5:00pm then your checkpoint is 9 hours old. Is that going to be a problem? What is the impact of losing a day’s worth of data?

For some sites, the impact might be minimal. For commercial sites, the impact might huge and therefore checkpoints need to be taken far more frequently or even constantly. Hosting company WPEngine offers Restore Checkpoints whilst Automattic’s service, VaultPress, provides real-time back up.

Current Host Backup Regime

Virtually all hosting providers have some sort of backup regime but I’d bet that most WordPress site owners wouldn’t be across the details. We simply assume that the backups are taken and that we’ll cross the restore bridge when we come to it.

But if you don’t know how it works, then how do you know that in the case of a disaster that it’s going to fix your problem?

Make sure you are fully across your host’s backup regime and decide what, if any, part it can play in your disaster recovery. Specifically, look at:

  • How often does your host back up your files and database?
  • Where are the backups stored? Offsite is safest but obviously increases restore time
  • What is the process to initiate a restore?
  • How long does a typical restore take?
  • How much does a restore cost?

DIY Backups

Being able to control the backup regime and restore without having to rely on your host may be your preference and there are plenty of solutions available for WordPress site owners.

When you are selecting a product, or a service, then you want to use the same criteria as when you are assessing your hosts back up regime:

  • Can it handle your required checkpoint frequency?
  • Where are the backup files being stored? (You really want offsite – including cloud, no point storing on the same webserver that is hosting your site)
  • What’s the process to complete a restore?
  • How long will a restore take?

VaultPress, the Ultimate Backup Solution

Of all the backup solutions, Automattic’s VaultPress service seems to be the most impressive particularly the Basic and Premium plans where real-time backup comes into play.

For these plans, VaultPress promises that it will “backup every post, comment, media file, revision and dash­board setting as they happen” which sounds like a checkpoint frequency dream.

The plans cost $15/mth (Basic) and $40/mth (Premium) which on face-value seems to be piece of mind at an exceptional price.

Keep It Short and Simple

If the creation of a plan seems too daunting then it won’t get done. You aren’t looking for a 200 page document that covers every minute detail; you want something that is short and to the point.

Be creative and use flowcharts for the Investigation Checklist; put the scenarios, actors and actions in a table. The point is, that this information is possible going to be accessed at a stressful time and it needs to be easy to understand and simple to follow.

Less is most definitely more.

jjjj
Once the problem is reported, the only thing that matters is how long it takes to fix

Minimizing Recovery Time

If you follow your plan and determine that a recovery is required (either the site is restored in its current location or recreated in a new location) then you want to make sure that the time taken for the restore is minimized.

Take A Backup Before Every Major Update

This is just a prudent step but if you are updating a plugin or theme, make sure that you backup the current files and the database so that if anything goes wrong you can quickly “back out” the changes by restoring the original files.

Track Your Uptime

Obviously, the earlier you know you’ve got a problem then the sooner you can start fixing it.

There are a myriad of uptime monitoring tools but a good place to start would be with the Monitor service that comes with Jetpack. This checks your site every 5 minutes and if the site takes longer than 10 seconds to respond then Monitor will send an email.

Using A CDN

If you have to completely restore a site, then potentially your uploads directory will be the biggest component of the restore, especially if you have a large number of image, video or PDF files on your site.

To remove this requirement completely, you can consider moving these files to a content distribution network (CDN) such as Amazon S3, MaxCDN or xxx. Whilst there will be a cost involved your recovery time could be considerably reduced. And, as a bonus, you visitors may find a speed boost when downloading and you’ll be taking load off your webserver.

Read more about integrating a CDN with WordPress.

Have Your Restore Process Defined

This might seem obvious but you need to know, in advance, exactly what the steps are for your restore process. Are you going to rebuild the site in situ or are you going to build it locally and then move to the production host? What format are your database backups in? What’s the order of restoring the various components? How are you going to test that the restore has been successful?

Write down your restore process, step-by-step and then test it thoroughly.

Do A Dry Run

Run through your restore process, either locally or on a test site. Find the gaps and potential issues now and not the first time you need to do it for real.

Once the process is finished, review the steps, address any issues and consider the time taken and tweak the process as necessary.

A major update to WordPress is always a good opportunity to test the process.

Do A Dry Run

Test the recovery process again. On a regular basis. The more you test it, the quicker you’ll be in a real-life disaster scenario.

Do A Dry Run

Okay, you get the message.

Photo of vaccination syringe
Perhaps you can’t immunize your site but you can reduce the impact of a disaster

Prevention Better Than Cure

An important aspect of a Disaster Recovery Plan is to try and reduce the risk of a disaster as far as you can.

Choosing The Right Host

Your host should be appropriate for your acceptable downtime limits.

If you (or your client) wants minimal downtime for their ecommerce site, then hosting on a bargain-basement shared-hosting platform is not going to provide the required uptime and probably not the response speed you want if a disaster occurs.

Generally, the more reliability and service you want or need then the more the solution is going to cost.

Keeping Your Site Secure

If your site being hacked is the most likely scenario requiring a full site recovery then it makes sense to do all you can prevent that hack happening in the first place.

Put aside time, or get professional help, to ensure that all the appropriate security steps have been taken for your site.

You can find out more about WordPress security with our recent series:

Switch Off Auto-Updating

Ever since WordPress shipped minor release auto-updating with version 3.7,  there’s been an ongoing debate about whether this is a good feature or not.

Whilst it’s a relatively small risk that a minor update to WordPress will break a plugin or theme, it does exist and you might decide that you want to remove this risk altogether and manually update after testing for any impact.

This will also extend to any plugins that provide the option for updating automatically.

Where To Store Your Disaster Recovery Plan

There is no point in having a Disaster Recovery Plan if you cannot access it when disaster strikes. The Plan has to be stored somewhere that is accessible by everyone who needs it and for that reason is probably best located on a third party service such as DropBox or Google Drive.

Personally, I like the idea of using Google Docs on Drive. Not only is it a straight-forward way of making the information available but it can also be editable and therefore become a “living document”.

But It Might Never Happen

If fact, not only might a disaster never happen, we actually don’t want it to happen. The irony of a Disaster Recovery Plan is that we don’t ever want to execute it.

The fact is, though, that there are too many components that we don’t have control over for the owner of any substantial or revenue-generating site to have a Disaster Recovery Plan that consists of simply of keeping their fingers crossed that it never happens.

Being prepared won’t stop a disaster from happening but it will mean that your site will be back up and running as quickly as possible.

Photo Credits: burning building, fingers crossed, vaccination

Have you had to recover your WordPress site from a disaster? Do you have a Disaster Recovery Plan?