4

My Initial Question:

What does the --dry-run option of the rsync command do? On the network where I am currently working, it seems to produce a very long list of files, which isn't very useful.

Here's some further detail:

I recently (finally) got around to deploying a samba share on a TrueNAS box. It has to be samba because I work with both Linux, Windows and occasionally OS X systems, so I want a networked storage location which I can potentially use from all three of those systems if necessary.

I am now currently working my way through moving data from a random array of hard disks to the storage pool on the TrueNAS system. The easiest way I have found of doing this is to use a rsync server on the TrueNAS system, and use rsync on my client to send the data.

Some of the drives I am copying data from have multiple copies of the same directory names. I can't guarantee they are identical however as some of them were made at a later date.

Caveat: I can't rely on the timestamps of the files. Reason: I moved a significant amount of stuff to a single disk before starting this data migration.

Example: I have one disk which is a 3TB drive which contains things like

Documents-backup/...
Documents-backup_2/...    # same directory structure as Documents-backup,
                          # may or may not be identical

then another drive with contains

Documents-backup/...    # may or may not be identical to dir on another disk

Some of these folders are significant in size with over 100 GB of content.

All I want to do is use rsync to:

  • First check (with dry run and checksum?) if the folders are identical. If they are I can discard/delete one of them, no need to copy it and maintain 2 copies on the NAS
  • If they are not identical, give a list of files with different checksums

I thought that the following command would do this:

rsync -a -c --progress --dry-run ./local-path user@ipaddress::rsyncservername/remote-path

However as far as I can tell all this is doing is printing a list of all the files being checked, not the files with differing checksums.

Research / Partial Answer?

I found this question by doing a search for rsync dry-run. This question mentioned differing permissions. Since I am using the archive switch -a, I think this preserves permissions. My guess would be that samba doesn't support Linux permissions and that this is causing rsync to believe there is a "difference" between the files despite the checksums being the same.

More meaningful recap of changes in rsync dry-run while backing up the Linuix home folder

So my question is slightly narrower, but essentially still the same. Given the constraints that I have (server must be samba) how can I do an rsync to check for any differences between the files using a checksum?

user3728501
  • 3,404
  • 9
  • 39
  • 54

1 Answers1

7

What does the --dry-run option of the rsync command do?

Exactly what man rsync says it does,

This makes rsync perform a trial run that doesn’t make any changes (and produces mostly the same output as a real run). It is most commonly used in combination with the -v, --verbose and/or -i, --itemize-changes options to see what an rsync command is going to do before one actually runs it.

Your suggested command is going to run very very slowly as it has to disable a significant part of its optimisations and instead checksum every single file. I note that you warn you previously omitted timestamps, and indeed you should use checksums to verify the initial data. Once you have got the timestamps copied, you should remove the -c (--checksum) flag from future synchronisation runs.

What you probably want is the --itemize-changes (-i) flag, to show you what needs changing for each file. The --dry-run (-n) flag will report every single file as needing changing - even if the only part that needs changing is the metadata such as the file modification time. You will probably also want to consider --delete to identify files that should be removed from the destination because they are no longer in the source.

Consider this example scenario

date >origfile             # Original file
sleep 65
cp origfile copyfile       # Same content, different timestamp
cp -p origfile samefile    # Same content, same timestamp
date >difffile             # Different content, different timestamp

ls -l ????file -rw-r--r-- 1 roaima roaima 29 May 21 17:03 copyfile -rw-r--r-- 1 roaima roaima 29 May 21 17:03 difffile -rw-r--r-- 1 roaima roaima 29 May 21 17:01 origfile -rw-r--r-- 1 roaima roaima 29 May 21 17:01 samefile

Only use --no-whole-file for this scenario. DO NOT use it in production code

for file in {copy,diff,same}file do echo "== $file ==" rsync --dry-run -ai --delete --no-whole-file --checksum origfile "$file" echo done

Output (in stages)

  1. copyfile needs to be updated (timestamps)

     == copyfile ==
     >f..t...... origfile
    
  2. difffile needs to be updated (content and timestamps)

     == difffile ==
     >fc.t...... origfile
    
  3. samefile needs no update (there's no output from rsync)

     == samefile ==
    
Chris Davies
  • 4,560