4

I have read the FAQ and I know this comes close to being closed as asking for a product recommendation...

I have looked at at least 40 "duplicate files" remover utilities (Windows, OSX and Linux) and none of them has the particular feature that I am looking for.

I need to now if there is anything out there that can do this or if I will have to write my own tool for it.

Just a "Yes, it exists" answer would be okay with me.
It would mean I just didn't search hard enough.

My required feature: Remove duplicate files across a large folder-structure, but ONLY if the duplicates reside in the SAME folder.
E.g. Say I have files A,B and C which are identical. A and C are in the same folder. B is in another folder. Either A or C needs to be removed (no preference), but B should be left alone.

Is there something out there that can do this ?
(Preferably Windows, but OS-X or Linux is OK too.)

Tonny
  • 33,276

3 Answers3

7

You can use fdupes without -r so it doesn't descend to subdirectories. This prints a list of duplicate files:

find . -type d -exec fdupes -n {} \;

-n ignores empty files. Add -dN (--delete --noprompt) to delete all except the first duplicate file.

You can install fdupes on OS X with brew install fdupes.

Lri
  • 42,502
  • 8
  • 126
  • 159
5

Well as I said, I worked up a Python script that does just that.

I've hosted it at Google Code and I've open-sourced it as GPL v3, so I assume anyone that wants to improve the program can do that.

I've also debugged it somewhat (created tens of files in Windows, deleted all leaving the originals). The code is highly commented as to inform anyone of what the code actually does.

I've run it on Python 3.3 but I assume it should work with the latest Python 2.

Oh, and best part, it should work on any OS Python supports (Windows, OSX, Linux, ...)

1

This is a slow but sure and very simple approach that should run on both OSX and Linux. I am assuming that you are interested in duplicate files residing in your $HOME but you can change that to suit your needs.

The idea is to first find a list of all directories, then compare the files inside them and delete any that are identical. As I said, this is very simplistic so it will just keep the first of any pair of files and delete the rest with no warning.

This will print out the dupes but will not make any changes to your files:

find $HOME  -mindepth 1 -type d | while read dir; do 
  find $dir -type -f -exec md5sum {} \; | sort > md5sums;
  gawk '{print $1}' md5sums | sort | uniq -d > dupes;
  while read d; do 
    echo "---"; grep -w $d md5sums | cut -d ' ' -f 2-;
  done < dupes
done; rm dupes md5sum 

This one will silently delete the duplicate files, only run it if you are sure that is OK:

find $HOME  -mindepth 1 -type d | 
while read dir; do 
  find $dir -type -f -exec md5sum {} \; | sort > md5sums;
  gawk '{print $1}' md5sums | sort | uniq -d |
  while read d; do grep -w $d md5sums | cut -d ' ' -f 2- | tail -n +2; done |
  | xargs rm ; 
done; rm dupes md5sum 

CAVEATS: This is slow, actually SLOW, will not give warnings and will delete files silently. On the bright side, it will only do so if those files are in the same directory which is what you want.

terdon
  • 54,564