7

I know about full backups, incremental backups and decremental backups. However, I wondered why nobody (Windows Backup, TrueImage, Paragon) seems to have implemented the following backup algorithm.

It needs a backup medium which supports links, e.g. NTFS. Ideally the backup medium is of the same format in order to support all features such as alternate data streams (ADS).

  1. First backup is a full backup. This will copy all files onto the backup medium into a subfolder of \. Let's call this folder L (for "last"). There's no special file format, just copy the files.
  2. With the next backup, a new subfolder of \ will be created, let's call it C (for "current"). Files which have changed from the full backup will be copied again from the source disk. Files which have not changed are moved from L to C and a hardlink is created to point from L to C.
  3. On repeated backups, the same procedure will be applied with C and another new folder.

Is there anything I miss in this algorithm which would not work?

While I did notice any problems yet, I can see the following advantages:

  • the last backup (C) is always a full backup. For restoring the backup, you only need this one backup. The user can delete any old backup without destroying the possibility of recovering (which is not the case in full, incremental and decremental backups).
  • Old backups will act like full backups due to the links, but take much less space on disk.
  • there's a full history of file changes if the user did not delete a file. But unlike SVN, it is possible to delete old revisions.
  • Moving files and creating links are very fast operations. Creating the backup should be accordingly performant.
  • It is possible to selectively delete changed files in old backups (e.g. only the big ones), not deleting a complete backup

5 Answers5

5

HardLinkShellExtension with its "Delorean-Copy" (see other answer) isn't the only "ready to use" solution. There are alternatives:

  • the console tool ln.exe from the same programmer with the same functionality. The author offers a pre-written batch-file for timestamped DeLorean copies too.
  • the GUI Backup solution HardLinkBackup which does pretty much exactly what you want.
  • use ln.exe from 1. to make a hardlink copy of the old backup into the new backup folder and then use xcopy or robocopy to only copy new files and remove old ones (i thinks it's --mirror for robocopy). Test it to make sure changed files are deleted and then copied and not just modified (the latter would change the file in older backups too because of the hardlinks).
  • using xcopy or robocopy to make normal backups and then run dfhl.exe /l /r /w /s /h "X:\Backups-parent-folder\." to hardlink all identical files.
  • same as 3. but finddupe -hardlink X:\Backups-parent-folder\** instead of dfhl.


Disclaimer: I've used all of the above mentioned programs except for finddupe, but not necessarily in the same way. And I've no monetary connection or investment or any other connection to any of the programs.

Limer
  • 471
3

I think that's basically what you call a Delorean-copy. There is, for example, the Link Shell Extension for Windows, that implements this behaviour. They have a quite good explanation in their documentation:

http://schinagl.priv.at/nt/hardlinkshellext/linkshellextension.html#deloreancopy

bweber
  • 131
3

Seems like a viable plan. It would decrease the amount of time taken to view and use the backups. If the backups are used frequently and one needs to see complete snapshots, this would be very handy.

I would change the wording "moved from L to C" to simply say "hard linked from L to C".

One consideration - deleting a file with large numbers of links (in reference to your last bullet point) means locating all of those links and removing them. So, recovering space selectively in that manner would be more challenging, but easy enough to do with the find command.

ash
  • 166
1

What you describe is already in use, with rsync and its --link-dest= option, via dozens of wrapper programs, such as dirvish, among others.

Dan D.
  • 6,342
1

What you are describing is essentially an incremental backup scheme.

As Dan D. points out, it's actually used by various tools, particularly on Unix-like platforms where hardlinks are handled natively by a lot of the programs that care.

However, lots of Windows programs don't deal very well with hardlinks. Back in the days of FAT, hardlinks would actually have been considered an error as no two names in the file system were allowed to point to the same data blocks.

What you describe is an incremental backup scheme because any one backup builds upon all the previous backups. The only real difference is how those previous backups are referenced, and the fact that it's easier to delete a previous backup because the data will only actually be deleted once the reference count for the file in question reaches zero, which will happen when it is no longer being referenced from any backup. Of course the downside of that is that it's harder to predict exactly how much space will be freed by deleting a given previous backup; in the extreme case, except for space used by file system metadata and reclaimed, it could actually be zero. (No changes between that backup and an adjacent backup.)

In the case of "normal" incremental backups, you have to do the restores manually. In the case of what you are describing, the reference is implicit. However, if you were to delete everything that wasn't actually copied during (has a reference count of exactly one in) the most recent backup, the backup would still be just as incomplete as it would be if you had made multiple incremental backups and then tried to restore only the most recent one.

user
  • 30,336