Deduplication is the finding and removal of duplicate files, or to keep both files but share the same allocation unit on the storage medium
Questions tagged [deduplication]
156 questions
58
votes
15 answers
Which duplicate files and folders finders exist for Windows?
I need a free duplicate file finder/remover app, with ability to find duplicate files/folders by name and/or by size and to remove one of duplicates.
Andrija
- 2,448
45
votes
5 answers
Is there a way to extract duplicate lines in Sublime Text?
I need to perform 2 operations in Sublime Text: extract unique lines and extract duplicate lines. For example for input
a
b
a
Extract duplicates should result in:
a
and Extract unique should result in:
b
Is there a built-in operation or a plugin…
Poma
- 1,896
41
votes
6 answers
How to replace all duplicate files with hard links?
I have two folders containing various files. Some of the files from the first folder have an exact copy in the second folder. I would like to replace those with a hard link. How can I do that?
qdii
- 1,107
17
votes
3 answers
How to deduplicate 40TB of data?
I've inherited a research cluster with ~40TB of data across three filesystems. The data stretches back almost 15 years, and there are most likely a good amount of duplicates as researchers copy each others data for different reasons and then just…
Michael Stauffer
- 283
16
votes
3 answers
Is there a compression or archiver program for Windows that also does deduplication?
I'm looking for an archiver program that can perform deduplication (dedupe) on the files being archived. Upon unpacking the archive, the software would put back any files it removed during the compression process.
So far I've…
Larry Silverman
- 293
14
votes
7 answers
What is the best method to remove duplicate image files from your computer?
I have a lot of duplicate image files on my Windows computer, in different subfolders and with different file names.
What Python script or freeware program would you recommend for removing the duplicates?
(I've read this similar question, but the…
BioGeek
- 602
12
votes
4 answers
How to remove duplicate rows based on some columns
I have an Excel sheet which contains duplicate rows.
I want to remove a row if its values in columns A C D E F are same as another row's values in those columns (ignore column B while calculating duplicates, but remove it while removing a row).
At…
user33949
- 429
11
votes
1 answer
In bash, how to find all copies of a given file in particular directories?
Let's say we have a file /a_long_path_1/foo.doc of size, say, 12345 bytes, and we would like to find all copies of this file in directories /a_long_path_2 and /a_long_path_3 including all their subdirectories recursively. The main parts of the names…
user1757545
11
votes
7 answers
Ways to deduplicate files
I want to simply backup and archive the files on several machines. Unfortunately, the files have some large files that are the same file but stored differently on different machines. For instance, there may a few hundred photos that were copied…
User1
- 9,701
9
votes
5 answers
Free Duplicate mp3 finder
Sometime back I had used a duplicate file finder for mp3 by analyzing the content.Unfortunately it was not free and the shareware had a lot of limitations.
Are there any freeware/OSS ones to detect and delete duplicate songs?
Quintin Par
- 1,145
8
votes
3 answers
How can I have two files with the same name in a directory when mounted with NFS?
I have a C++ application test that creates 10,000 files in an NFS mounted directory, but my test recently failed once due to one file appearing twice with the same name in that directory with all the other 10,000 files. This can be seen on either…
WilliamKF
- 8,058
8
votes
1 answer
View ZFS deduplication ratio on a dataset
I have a tank consisting of several datasets, only one of which is configured to use deduplication.
How can I see the ratio for this dataset? I get a ratio of 1.00x for the whole pool but I imagine this is just reporting the ratio on what's in the…
deed02392
- 3,132
- 6
- 30
- 36
7
votes
3 answers
Why do I have identical files in one directory, in Windows 7?
I just rebuilt my system after a new Power supply fired my CPU, MBD, Video card & a Blu-Ray drive.
During all this I had to restore Windows from a backup (sector copy).
All seems well, but today I went into Sample Pictures (by accident) and was…
PuterPro
- 81
- 1
- 4
7
votes
4 answers
Is there a diff utility that allows you to exclude columns?
For example, I have a text file, each line is a long string. I want to exclude 2 "segments" of this string, say columns 1-7 and 20-22. So the bottom 2 lines below would be a match:
123456789012345678901234567890…
user39160
- 175
6
votes
2 answers
Software to detect mp3 almost-duplicates?
Because of some unfortunate circumstances I noticed I irrevertibly mixed up my sorted and retagged mp3s with an old backup. That means now I have files that are basically duplicates except for the id3 tags and paths. FSlint does a nice job of…
Tobias Kienzler
- 4,629