10

How can I compare directory trees of huge size?

I am thinking a free tool to make a snapshot of the filesystem structure (listing of files and directories, their size & timestamps) would be ideal so I could compare the snapshot to another one made later.

Treecomp would be great for that but with a huge tree (I mean really huge!) it crashes because it tries to keep it in memory (4GB of memory are not enough)...

I worked around the problem by splitting the snapshots into pieces and compare these pieces. But that's tedious, and the problem can for sure be solved better.

Is there another free (best also open source) tool that I can try out? Or is there another way to do this that I am overlooking?

OS can be Linux or Windows.

Excellll
  • 12,847
jdehaan
  • 953

12 Answers12

10

you can just use in the terminal

du -a

This will return all the files in all sub folder including there sizes, then just compare the files

To save the data to a text file

du -a > dump.txt

Then you can just use something like diff to compare the files

This is for linux :D

monkey_p
  • 561
5

This is what I use to compare really big directory trees:

rsync --archive --dry-run --verbose /src/directory/ /dst/directory/
3

I've used MD5 hashes and diff to compare trees in the past. It's slow but will find changed files in cases where the dates are not reliable. It's also portable so you can transfer the index instead of comparing files via the network.

find /path/to/check -type f | xargs md5sum > after.txt

diff before.txt after.txt > diffs.txt
Chris Nava
  • 7,258
2

I'll try to expand a bit on how to do it with Total Commander (I hope I understood what you want to do).

  • install DiskDir packer plugin (I put a direct link to plugin, if you prefer you can go to plugins page and look for DiskDir plugin
  • after the plugin is installed "pack" the directory you want to track changes of with Alt+F5 and select "lst" from the drop down list in Packer part of the dialog box; this will create a "package" that you can enter by pressing enter, like you would enter a directory and it will show complete contents of the directory
  • when comparing results go to the original directory on the left pane and enter desired snapshot on right pane
  • use "Synchronize Dirs" function, located in Commands menu
  • in Synchronize directories window uncheck compare by contents, check Subdirs and Ignore date (or not if changed date is important) and run comparison
  • window will show you files that are equal (in this case not by contents, only by size), files that are different and files missing on left/right side

Since the snapshot is a plain text file and you are not comparing by contents it should be fast but I never used it for a really huge directory.

This is useful if you are not making backups but only wish to make a snapshot of what contents of the directory was at some point. If you do make backups you can use same tool (Synchronize dirs) to also compare by contents.

There is also an extended version of DiskDir plugin, download link is in the first post. This version enables you to have packages (like zip, 7z...) show as directories in the snapshot. This would of course increase time to make a snapshot.

T. Kaltnekar
  • 8,484
2

One week ago take first snapshot:

rsync --archive /the/source/ /var/snapshot1/

Now take second snapshot:

rsync --archive /the/source/ /var/snapshot2/

And compare them:

rsync --archive --list-only /var/snapshot1/ /var/snapshot2/
Perleone
  • 260
1

You could just use the command prompt to dump the listing:

DIR /S >Listing1.txt

(you can fine tune the options if you want, but this basic syntax is probably good enough)

To compare the two listings use any file comparison tool, like WinDiff, or CompareIt etc. WikiPedia has a list of such tools here: http://en.wikipedia.org/wiki/Comparison_of_file_comparison_tools

1

Have you tried meld? I have no idea if it's any good for huge trees, but you can always give it a try :)

Meld is a visual diff and merge tool targeted at developers. Meld helps you compare files, directories, and version controlled projects. It provides two- and three-way comparison of both files and directories, and has support for many popular version control systems.

Meld helps you review code changes and understand patches. It might even help you to figure out what is going on in that merge you keep avoiding.

Peltier
  • 6,504
0

Have you tried Back In Time?

It's a GNU/Linux tool that makes a snapshot of a filesystem by using hard links or physical copies of files and directories.

It's very configurable and has a daemon and GUI parts that runs separately.

slhck
  • 235,242
atrent
  • 83
0

I did this in Total Commander, using the synchronise directory feature. 1.2TB data across two drives.

0

Freecommander has the option to compare two different folders.

0

You may also try :

Karen's Directory Printer

Karen's Directory Printer can print the name of every file on a drive, along with the file's size, date and time of last modification, and attributes (Read-Only, Hidden, System and Archive)! And now, the list of files can be sorted by name, size, date created, date last modified, or date of last access.

File List Generator

FLG is a free File List Generator. It searches the directory tree for the files with the requested criteria and produces a list in HTML format.

harrymc
  • 498,455
0

You could check Beyond Compare.

It is not free, but you can test it for 30 days (working days, not days after installation). Perhaps that's enough time to make your task.

knut
  • 101