How can I extract all files from a Time Machine sparsebundle archive without duplicates?

Question

I have a sparsebundle image created with Time Machine from a Mac running OS X Snow Leopard, containing backups of the same mac with a lot of different snapshots. I would like to extract all files from the archive without copying the same file from several snapshots, so I don't get a hundred times the size of data. Here is what I've tried so far.

I found out that the sparsebundle could only be mounted on a mac running Snow Leopard, which I fortunately have. I then tried to extract the files to another disk with the following command in terminal:

ditto "/Volumes/Time Machine Backups/Backups.backupdb/" "/Volumes/newdisk/"

It just made duplicates until the new disk was full. The original size of the sparsebundle is about 425 GB, and I tried copying to a 2 TB SSD. I then did some reading, and found out that Time Machine uses hardlinks to link to different files and folders without increasing the size of the sparsebundle, and I then found out that I could probably use rsync, which I had to build from source on Snow Leopard to get version 3.2.3, which should be able to preserve hardlinks to both files and folders. I then tried the following command:

rsync -aHAXE --numeric-ids "/Volumes/Time Machine Backups/Backups.backupdb/" "/Volumes/newdisk/"

It did the same thing, filling up the 2 TB SSD. The output looked a little different, and I know that at least some of the hardlinks seem to have been preserved, since there were a lot of files with a size of 0B, when checking it on my Windows computer, where I'm using Paragon HFS+ driver to be able to read the file system. But it aparrently still isn't enough.

I am trying to copy all the snapshots, but still only a one copy of each file. The sparsebundle contains backups from 2012 to 2014, and there're definitely some files in the earlier snapshots, that were removed later in 2013 or 2014. It's telling that the size of the sparsebundle is over 400 GB, even though the mac the backups come from had a 250 GB hard drive.

What can I do to get all files out of the archive, without lots of duplicates?

Useless · Accepted Answer · 2025-04-16T16:36:14.230

So, you want only the latest copy of every file that ever existed (in the backup). The result won't necessarily make perfect sense (eg. if files were moved or directories renamed), but we can avoid duplicate copies of exactly the same file.

You need to:

iterate over the mounted sparsebundle's timestamped snapshot directories in reverse chronological order, starting with the most recent
rsync everything in there to your target directory
make sure that files with the same path from older snapshots, don't overwrite the ones from later snapshots: use --ignore-existing to completely prevent later runs of rsync overwriting files from earlier runs

I don't have a Snow Leopard machine to hand so I can't test this, but something like

cd "/Volumes/Time Machine Backups/Backups.backupdb/Aksel Christoffersens MacBook/"
ls -t | xargs -I_ rsync -aAXE --ignore-existing "_/Macintosh HD/" "/Volumes/newdisk"

NB. I skipped the -H because hard links are typically between snapshots - if you actually have lots of hard links within the snapshotted filesystem, you probably do want this.

How can I extract all files from a Time Machine sparsebundle archive without duplicates?

1 Answers1