20

On a Win7 NTFS volume, I'm using cwrsync which supports --link-dest correctly to create "snapshot" type backups. So I have:

z:\backups\2010-11-28\cygdrive\c\Users\...
z:\backups\2010-12-02\cygdrive\c\Users\...

The content of 2010-12-02 is mostly hardlinks back to files in the 2010-11-28 directory, but there are a few new or changed files only in 2010-12-02. On linux, the 'du' utility will tell me the actual size taken by each incremental snapshot. On Windows, explorer and du under cygwin are both fooled by hardlinks and shows 2010-12-02 taking up a little more space than 2010-11-28.

Is there a Windows utility that will show the correct space acutally used?

kbyrd
  • 2,256

8 Answers8

15

Try using Sysinternals Disk Usage (otherwise know as du), specifically using the -u and -v flags will only count unique occurrences, and will show the usage of each folder as it goes along.

As far as I know the file system doesn't show the difference between the original file and a hard link (that is really the point of a hard link) so you can't discount them on a folder-by-folder basis, but need to do this comparatively.

To test I created a random folder with 6 files in to. Cloned the whole thing. Then created several hard and soft links inside the first folder to reference other files in the first folder, and also some in the second.

Running du -u -v testFld results in (note the values next to the folders are in KiB):

       104  <path>\testFld\A
        54  <path>\testFld\B
       149  <path>\testFld

Totals:
Files:        12
Directories:  2
Size:         162,794 bytes
Size on disk: 162,794 bytes

Running du -u -v testFld\a results in:

104  <path>\testFld\a
...

Running du -u -v testFld\b results in:

74   <path>\testFld\b
...

Notice the mismatch?
The symlinks in A that refer to files in B are only counted against A during the "full" run, and B only returns 54 (even though the files were originally in B and hard-linked from A). When you measure B seperately (or, if you don't use the -u unique flag) it will count its "full" measure of 74.

DMA57361
  • 18,793
5

PowerShell 5 may be an option. It is available for Windows 7 but I only tested this on a Server 2012 R2 with the April 2015 Preview

The filesystem provider in PowerShell 5 has two new properties LinkType and Target:

ls taskmgr.exe | fl LinkType,Target

this returns:

LinkType : HardLink
Target   : C:\Windows\WinSxS\amd64_microsoft-windows-advancedtaskmanager_..._6.3.9600.17..2\Taskmgr.exe

So now I can only show all files in system32 that are not hardlinks:

cd $env:SystemRoot\System32
ls -Recurse -File -force -ErrorAction SilentlyContinue | ? LinkType -ne HardLink | Measure-Object -Property Length -Sum

this returns:

Count    : 844
Sum      : 502,486,831

you can compare that with all files:

ls -Recurse -File -force -ErrorAction SilentlyContinue | Measure-Object -Property Length -Sum

Count    : 14092
Sum      : 2,538,256,262

So over 13,000 files with 2GB+ are hardlinks

3

You can use ln.exe to show the "true size" of a directory tree:

ln.exe --truesize z:\backups\.

It will only detect hardlinks below that starting folder.

Limer
  • 471
1

TreeSize Professional (~$55, 30 day trial) claims to distingish NTFS hardlink disk space. A quick trial seems to bear this out.

Hardlink support is not turned on out of the box: go to Tools > Options > Scan, re-scan, then use Ctrl-1 and Ctrl-2 to switch between Size and Allocated space. Allocated is actual space used, while Size is the statistic normally reported by other programs.

There is a performance penalty for turning on hardlink support (and symlinks and mounts too if you want that also). The colour palette is garish for my taste, but that seems to be par for the course in this genre. Also be careful when clicking around in the box chart area -- it's easy to accidentally move a folder with a mistaken drag-n-drop when you only meant to expand it.

matt wilkie
  • 5,324
1

I also do some research about this question. Here is the results I discovered.

The folder size containing hardlinked files in NTFS may be considered in three different meanings:

  1. Size including sizes of all hardlinked files (which is shown by WE).
  2. Size of unique files only in terms of the current folder.
  3. Size of unique files only in terms of the whole disk.

The number 2 is what is shown by TreeSize Professional, in Details tab, Allocated column, if option "Track NTFS hardlinks" is enabled.

Here is exaple for winsxs folder (7.5Gb in opposition for 10):

image

Receiving number 3 value is still a question for me. Although I was able to get a lower bound by using Total Commander with NL_Info plugin. What I have got is a size occupied by files which have onle one hardlink (unique files). It was about 5Gb for a given example.

So trying to expand harrymc answer or say in other words.

fixer1234
  • 28,064
0

The key here is to find the change in available space left in your partition. If your snapshotting is working correctly this will mean that if you do an rsync (or cwrsync) using --link-dest (the docs for which specifically say it will use hard links for identical files) on a directory where some of the files haven't changed, this will eat up less of the available partition space than if you had done a simple copy.

I have been inspired by a couple of the answers to find the solution which works for me. The chosen answer by DMA57361 didn't seem to work that clearly for me: firstly, by the way, at the time of writing, 2021, there is already a du command in W10, but it is not the one used in DMA57361's question: she/he has a link where you can get hold of the right du.exe (or du64.exe).

I'm actually using rsync in WSL, so I naturally used the BASH df command to find out how my operations were affecting the available space. But if you're doing something in the normal W10 CLI console you can use dir, which also shows the total disk space available in the current partition at the end:

08/10/2019  16:57    <DIR>          software projects
19/02/2021  08:04    <DIR>          Videos
               6 File(s)      3,695,481 bytes
              35 Dir(s)  38,383,546,368 bytes free

It works just as you might hope: say you have 20 files in a directory. You do one snapshot, using rsync, and then you modify one of those 20 files, and then do another rsync, referencing the first snapshot directory using --link-dest, and then check the change in available partition space. What you find is that the space has declined by much less than the volume of all 20 files. It will in fact have declined by more than the size of the one modified file, and the reason for this must be (I assume) that the newly created hard links (19 of them) take up non-zero disk space.

You can confirm that there are then hard links present by running the ln.exe command from Limer's answer: this will show that there are 19 of them.

0

Besides Sysinternals du one can utilize du from Cygwin.

Modern Cygwin understand lots kinds of links on Windows (WSL/hardlinks/etc).

GNU du is much powerful that Sysinternals'.

gavenkoa
  • 2,154
0

I think some facts need to be set right here.

Windows cannot "detect" hardlinks, since every file is actually a hardlink to a bunch of bytes on the disk.

The du tool detects duplicates, but that is false too, since if folder A contains files and B only contains hardlinks to the files in A, then du of A and du of B will return the same answer - the size of the files coming originally from A, but these files are now also in B.

This is actually correct, since for example if you deleted A then its files will not be deleted on the disk, because they are still referenced by B. With hard-links, which file is the source and which one is the hard-link is quite arbitrary and meaningless.

Products such as du will list a directory while discounting duplicates. This will only work if all files and hard-links are contained in one directory. Many folder-list products do that.

Conclusion: With hard-links, the question of "the actual size used in an NTFS directory" is meaningless.

harrymc
  • 498,455