The correct levels of back-up save time, bandwidth and space
5 September 2018 | 0
One of the most basic things to understand in back-up and recovery is the concept of back-up levels and what they mean.
Without a proper understanding of what they are and how they work, companies can adopt bad practices that range from wasted bandwidth and storage to actually missing important data on their back-ups. Understanding these concepts is also crucial when selecting new data-protection products or services.
A full back-up contains all data in the entire system. A full back-up of the C:\ drive in Windows contains every file on the C: drive. A full back-up of a Windows system should contain a copy of every file on every drive on the machine or VM (e.g. C:\, D:\, F:\, etc.). The same goes for a full back-up of a UNIX or Linux machine; it contains every file on every file system on the machine (e.g./, /home, /opt, etc.).
“Incremental back-up backs up all data that has changed since the last back-up of any kind. The challenge with this from a modern data protection standpoint, is that we are attempting in every way to minimise the I/O impact of back-ups on the server, and backing up a 10 GB file because 1 MB has changed is not very efficient”
The only thing that should be excluded from a full back-up are files that were specifically excluded by the configuration. For example, many system administrators choose to exclude directories that will have no value during a restore (e.g. /boot or /dev), or contain transient files (e.g. C:\Windows\TEMP in Windows, or /tmp in Linux).
There are two philosophies when discussing what files should be included or excluded from back-up: back-up everything and exclude what you know you do not need, or select only what you want to back-up. The former is the safer option, the latter will save some space on your back-up system. Some people see it as a waste to back-up application files, such as the directory into which you have loaded Oracle or SQL Server. They believe they would simply reload the application during a restore. The risk of this approach is that someone will place valuable data in a directory that is not selected for back-up. For example, if you select only /home1 or D:\Data to be backed up, how will the back-up system know if someone adds /home2 or E:\Data? This is why it is much safer to back-up everything and exclude only the files that you know you don’t need, even if it does take up some additional space. An exception to this might be if you have a strongly controlled environment where all data is always loaded in the same place, and you have a well-orchestrated solution for replacing the operating system and applications in a restore.
An incremental back-up typically backs up all data that has changed since the last back-up of any kind. Historically, such back-ups were file-based back-ups, meaning that they backed up all files that had changed since the last back-up. The challenge with this from a modern data protection standpoint is that we are attempting in every way to minimise the I/O impact of back-ups on the server (especially when backing up VMs), and backing up a 10 GB file because 1 MB has changed is not very efficient.
This is why many vendors have switched to block-based incremental based back-ups, which back up only the blocks that have changed. The most common way to do this is when back-up software products are backing up VMware or Hyper-V using their APIs. The app notifies the appropriate API it is doing a block-based incremental, after which it is given a list of blocks to back up.
Although it has meant a few different things over the years, it is now widely accepted that a differential back-up will back-up all data that has changed since the last full back-up. This type of back-up was much more in vogue in the days of tape, as it minimised the number of tapes that was required for a restore. A restore needed the latest full, followed by the latest differential, followed by the latest incremental.
If you are still doing tape-based back-ups, consider this: move from weekly fulls to a monthly full, weekly differential, and daily incremental. A restore will need to load one more back-up than it would have needed to load under a weekly full back-up setup. It saves a tremendous amount of tape and network bandwidth. This has been quite popular for quite a while for those still using tapes.
The advent of disk and deduplication has made full and differential back-ups passé. As mentioned previously, the reason we did the occasional full and differential back-ups was to minimise the number of tapes necessary to perform a restore. This no longer applies in the world of disk back-ups. As long as a product has been architected to fully utilise disk, restoring data from thousands of incrementals should take no more time than restoring it from a single full. This is because the back-up system is simply keeping a record of where all of the files/blocks are in its storage and transferring all of those files/blocks from its storage back to the client during a restore. How those files/blocks got there is irrelevant in a modern back-up world. Forever-incremental, especially if it is implemented using a block-based approach, is the most efficient way to update your back-up repository with the latest information from each back-up client.
Windows systems use something called the archive bit to determine if a file has changed since the last back-up. Any modifications to a file result in its archive bit being set, after which any back-up of any level would back it up. After the file has been backed up, the back-up application clears the archive bit, after which it will not get backed up again until the next full back-up.
Many back-up purists do not like the archive bit, if for no other reason than it should be called the back-up bit – as back-ups are not archives. Other issues with the archive bit include the fact that if you have two back-up applications running at the same time they will step on each other by clearing the archive bit.
The move of most companies to virtualisation, and the use of back-up APIs that interface at the virtualisation level, followed by the use of block-based incremental back-ups has somewhat made the archive bit not as important as it used to be. It really only applies in host-based back-ups, which are becoming more rare every day.
IDG News Service