0

I have my custom nightly backup solution using rsync with --link-dest to only update what's new, and --hard-links to keep a faithful representation of my source. Also, both options help minimize network traffic and storage space.

Recently I wanted to move around hundreds of media files in various subdirs of a "media" dir, and I figured the best way to do this is to create a parallel dir structure with hard links to those files and move there the files as I pleased, expecting no additional space to be used (apart from the dir entries), eventually having the new dir structure under the existing dir name.

To my surprise, this process started to create new files and transferring lots of data. After taking a closer look and playing with --dry-run, I realized that what rsync wanted to do was to first transfer the files in the new dir and then create hard links to them for the old dir (discarding the existing copy from the previous backup), rather than just create hard links from the new files to the old ones.

What seems to happen is that rsync first processes the source in alphabetical order and then looks at what is in the previous backup. So the name of the dirs matters. When several files have the same inode, the first one rsync finds will be the one that gets hard linked to the previous backup or copied or updated, while the following files with the same inode will get linked to that first file's backup. If rsync finds first the new file, it will copy it, but if it finds the old one, it will link it to the previous backup.

A question is if there is anything I can do besides making sure the names get sorted the right way. (rsync could look at all the files that have the same inode at the source and check if any of them has a corresponding file at the destination, instead of doing this only for the first file.)

A related question is if I can avoid manually renaming dirs in my backup. (What I really wanted was for the files under MyDir to be moved to different subdirs. I created MyDir2 and in it I created hard links to the files in MyDir and then moved things around in MyDir2, but now that I'm done, I'd like the new dir structure to be under MyDir. Making any changes in MyDir would lead to file copying. A workaround is to go to my source and to the last backup, and in both places remove MyDir and rename MyDir2 as MyDir.)


Edit: I tried to reproduce this with a small example, but in this case it worked as I wanted: when a new hard link was found in the source, it resulted in a hard link at the destination, regardless of sort order. So now I'm quite confused. Could the size of the backup matter? (The hundreds of files that I wanted to move were a small part of my 3TB backup containing 800,000 files in 150,000 dirs.)

What I did:

I started with this:

source
  dir2
    file1
    file2

Then I copied source to dest1.

Then I created dir1 and dir3, and inside them hard links to file1 and file2:

source
  dir1
    file1
  dir2
    file1
    file2
  dir3
    file2

Then I ran:

rsync --archive --link-dest ../dest1 --hard-links --itemize-changes source/ dest2/

created directory dest2
cd..t...... ./
cd+++++++++ dir1/
hf+++++++++ dir1/file1 => dir2/file1
cd+++++++++ dir3/
hf+++++++++ dir3/file2 => dir2/file2

Unlike with my real backup, in this example no new files got created in dest2, only hard links.

ciobi
  • 43

0 Answers0