Preliminary note
Test the solution on some expendable pair of directories first.
Solution
This answer uses *nix tools. It should work in Cygwin. I mean in a shell (like bash) provided by Cygwin. (The shell is important, see this question.)
To be DRY, I will use shell variables. If you ever need to apply this answer to other directories then it's enough to change the variables, while commands that follow are static. Use absolute paths. Run this snippet to set the variables:
reference='/cygdrive/c/MyData'
mutable='/cygdrive/c/MyDataBackup'
(Single-quotes are not necessary in this particular case; however users without experience who want to process directories with spaces in names will probably appreciate the quotes being already in the right places.)
You need to cd to the mutable directory. If the below command fails for any reason, abort.
cd -- "$mutable"
This is a command that does the real work:
find . -type f \
-print \
-exec test -f "$reference"/{} \; \
-exec cmp -- {} "$reference"/{} \; \
-delete
Explanation
. defines our starting point, the current working directory. Thanks to the prior cd this will be the mutable directory. We don't use "$mutable" as the starting point, because we need find to consider relative paths so we can concatenate them with the path to the reference directory later. Our find will try to test all files under (and including) ., descending to subdirectories of any depth.
-type f is a test that checks if the currently considered file is a regular file. The purpose of this test is to avoid giving files of other types to cmp later. E.g. we don't want to use cmp with directories.
-print prints the pathname of the currently considered file. This is only to give indication of progress; you can omit -print if you want.
-exec test -f "$reference"/{} \; tests if there is a regular file under the same relative path in the reference directory. In the manual of GNU find -exec … ; is described as action, but it's also a test: it succeeds iff the called executable (here test) returns exit status 0, this is what we're relying on here. Our test is not only to avoid giving files of unexpected types to cmp later; it's also to:
- avoid giving a nonexistent file to
cmp;
- avoid giving a symlink to
cmp (see below).
-exec cmp -- {} "$reference"/{} \; is a test that actually compares the two files. Note if cmp is given a symlink and the target of the symlink then it will tell you the contents are identical. In the context of your question: if foo in the reference directory is a symlink to foo in the mutable directory then cmp will make us think there are two copies, while the only copy is in the mutable directory and if we blindly believe cmp then we will delete it. Not giving symlinks to cmp (see above) solves this problem.
-delete tries to delete the currently considered file. This action will be performed iff all the previous tests succeeded for the file.
Portability
AFAIK find in Cygwin is GNU find, it supports -delete which is a non-portable extension. GNU find also supports expanding more than one {} in -exec, as well as expanding {} concatenated with some string; these features are not portable. If you ever need a portable solution, use the below snippet. It's an alternative to the above, not an addition.
find . -type f \
-exec sh -c '
reference="$1"
shift
for f; do
printf "%s\\n" "$f"
test -f "$reference/$f" \
&& cmp -- "$f" "$reference/$f" \
&& rm -- "$f"
done
' find-sh "$reference" {} +
Reasonable additions
Next you probably want to delete empty directories from the mutable directory:
find . -type d -empty -delete
-empty and -delete are not portable. It's relatively easy to replace -delete (with -depth + -exec rmdir -- {} \;), not so easy to replace -empty, I won't elaborate.
Maybe you also want to delete symlinks and such. The following command tries to delete files, excluding directories and regular files:
find . ! -type d ! -type f -delete
Now the mutable directory (i.e. our current working directory) contains only a minimal directory tree with regular files that are candidates for manual inspection.
Notes
In general there are race conditions (TOCTOU) that may allow a rogue user to make you delete a file in a wrong directory. E.g. see Race Conditions with -exec.
In many places I used --. If the paths in the variables are absolute and the starting point for find is . then -- is not really needed. I decided to use -- in case someone uses this answer as an inspiration and writes code where -- may actually be useful.
find-sh is explained here: What is the second sh in sh -c 'some shell code' sh?