Instant recovery: the quick way to restore lost files and test back-up systems
Directly accessible back-ups for quick restoration of files that also enables checking whether the back-up system itself is actually working
13 August 2019 | 0
The concept of instant recovery is relatively simple – the ability to run a virtual machine directly from a back-up of that VM – but the possibilities offered by such a simple concept are virtually limitless, which explains why it is considered one of the most important advances in back-up and recovery for many years.
Before the advent of instant recovery all restores were basically the same, starting with how back-ups were stored – in some type of container or image. Prior to commercial backup-and-recovery software, back-ups were stored in formats such as tar, cpio, or dump.
Most commercial back-up products chose to use other formats, typically proprietary ones, to store back-ups in, but the result was always the same; back-ups must first be restored in order to be useful. A restore was the reverse of a back-up; it opened the back-up container, extracted the appropriate files and copied them to the appropriate location.
The road to instant recovery started with some back-up companies choosing to store their back-ups in a way that made them directly accessible; they were no longer trapped inside a container, proprietary or not. This allowed the ability to directly mount the back-up of the file system instead of having to restore it first. For example, some back-up systems made it possible to directly access a backed-up VMDK as a VMDK, which meant that you could boot the VM using VMware.
What started as something to make the recovery of individual files faster quickly turned into something much more. For the first time, customers could easily see if the back-up of their VM was any good simply by asking the back-up system to mount the back-up as an actual system. It broke the fundamental axiom that you never knew whether or not your back-up was good until you restored it. This was definitely a game changer.
It is important to understand the performance characteristics of a typical recovery set up because they are rarely designed to perform as well as a typical production system for many reasons.
The first challenge is that the hypervisor is not really reading a VMDK image; it is reading a virtual image being presented to it by the back-up product. Depending on which product you re using and which version of the back-up you chose, the back-up system may have to do quite a bit of work to present this virtual image. This is why most back-up systems recommend limiting the number of instant booted images at a time if performance is important.
The second reason instant recovery is not typically high-performance is that the VMDK is on secondary storage. In a world where many primary systems have gone to all-flash arrays, today’s back-up systems still use SATA, which is much slower.
The final enemy of high-performance in an instant-recovery system is that many back-ups are stored in a deduplicated format. Presenting the deduplicated files as a full image takes quite a bit of processing power and again takes away from the performance of the system. Some deduplication systems can store the most recent copy in an un-deduplicated fashion making them much faster for an instant-recovery set up.
How does instant recovery work?
It was no small feat to get to a point where customers could directly mount their back-ups into production or test. The first big change is that back-ups had to be stored in a way that allowed them to be directly accessed; they could not be stored inside a container like tar or a proprietary image from another vendor. Some type of driver also needs to sit on top of the data in a way that allows access to multiple views of the data so that you can access the backup of a VM from different points in time. Most importantly, this driver will need to have read-write access in order for a VM to actually run, which means that it really needs to present a virtual view of the back-up – not a direct one. Otherwise running a VM from its backup would actually overwrite the back-up.
Once all of the above has been accomplished, the back-up system needs to make available to the hypervisor the virtual view of the appropriate VMDK. This is typically done via NFS, which the hypervisor will see as a data store, allowing it to import and run the VMs.
Due to the performance characteristics mentioned above, the running VM is only temporary. If the VM is needed long-term, it needs to be restored to a typical location were VMs are stored. This can also be done by using something like Storage vMotion.
What can you do with it?
Many see backup testing as the best possible use of the instant recovery feature, and it goes way beyond simply mounting a particular VM. Some back-up products are able to create recovery groups with the appropriate boot order and boot several VM’s together in order to test the recovery of all of them. Imagine the level of comfort such testing would give a typical back-up administrator.
The most common use of instant recovery is the same as the initial use it was designed for – file-level recovery from an otherwise opaque image of a VM. Even if a particular back-up product has the ability to do file-level recovery from within a VM back-up, some customers prefer this method of recovery instead.
Instant recovery of a VM can also be used to copy a production VM to another location for testing or other purposes. Again, while most backup products have the ability to restore a back-up of a VM to a different data store or hypervisor, some customers prefer using other tools to accomplish that task. Being able to directly access the VMDKs of a given VM gives these customers the functionality they are looking for.
Instant recovery can also be used in a limited way to recover an entire VM if that VM becomes damaged in some way. For example, if someone accidentally deleted or corrupted the VM decays of a given VM, being able to quickly run that VM from a backup would allow them to recover from that mistake relatively quickly while they rectify it. However, instant recovery is not typically meant to take the place of an entire DR system due to the performance characteristics of how it works.
Instant recovery has become so popular that many customers have put it on their “must have” checklists when sending out RFPs. Using it to automatically test your entire back-up every night could greatly increase your confidence in how well your backup system works. And imagine how good you would look when you immediately boot up a VM that someone accidentally deleted. Instant recovery truly changes how a back-up system is perceived.
IDG News Service