3

I was using 7zip to compute the hash of a folder (with subfolders), which it can do with two options, with or without including the file names.

However, on the linux version of 7zip, the hash feature is not implemented. I tried different methods to duplicate the result, but none of these methods would give the same result on linux and windows.

Examples of results :

"7za.exe h -scrcsha1 myfolder" on windows gives :

SHA1   for data:              D54D3168B16BFEE600C3A77E848A2A1C1DBCBC59
SHA1   for data and names:    BCE55085200581AD1774CC25AE065DE7DE60077D

, whereas on linux I have :

find . -type f -exec sha1sum "$PWD"/{} \; | sha1sum
ee44137f2462bdfea87ec824dab514f288ae3e6c  -

or

find . -type f | xargs sha1sum | sha1sum
8f971311a28bcdee36fab0ce87a892564622db40  -

So I can't use the result from one platform on another.

(I did verify that the result for a single file is the same for both platforms.)

Nyny
  • 131
  • 1
  • 4

3 Answers3

3

Simply running the following command won't necessarily work:

find . -type f | xargs sha512sum | sha512sum

The issue you may face is that the order of files reported by find is different from system-to-system or even from directory copy to copy.

Instead, try running the following:

find . -type f | sort | xargs sha512sum | sha512sum

Feel free to swap sha512sum for another - e.g: md5sum / sha1sum / sha256sum depending on your requirements.

Note that this may get slow for large directory trees, in which case you may prefer a more complex script to traverse the hierarchy.


Example:

$ find . -type f | xargs sha512sum | sha512sum
097e56f6b751c1da15ce5b9dce853ffcc89e06e9cbe10a8dc0894dedb834d40dc4228c65e48bd53f136dd6a7700b0ab07e8e12e7100956db00b0d1b9ef0b9956  -

This includes file names and content in the final hash, but does not include metadata - modification times, permissions, etc...


Note that you can use these utilities on Windows by using "Windows Subsystem for Linux". I've just installed it, which was a painless experience, and which also made me realise the issue with find's reported ordering.

Also watch out for how symbolic links are handled in your tree on Linux vs. Windows.

Attie
  • 20,734
2

Unfortunately,it seems that it's impossible to reproduce the hash of a folder generated by 7-zip.

This is becasuse 7z uses the FindNextFileW() function to enumerate the directories (7z-1900src/CPP/Windows/FileFind.cpp, line 198).

The order of the function's return value is not guaranteed and can be file system dependent (According to https://docs.microsoft.com/zh-cn/windows/win32/api/fileapi/nf-fileapi-findnextfilew).

So if you want to impliment a platform independent directory hashing function, you should use an unified sorting function.

仕刀_
  • 21
1

Since Linux cannot duplicate the 7zip checksum and I do not have nodeJS, I installed "Windows Subsystem for Linux" to verify a folder copy from a Windows computer to a Synology NAS. Installing WSL was quite painless, just follow the docs.

For a command which actually generated the same hash on both Windows and Linux, I primarily referred to How can I calculate an MD5 checksum of a directory?, which explains how to sort results consistently between Windows and Linux and also how NOT to ignore empty directories. Sorting consistently is accomplished with LC_ALL=C:

find . -type f -print0 | LC_ALL=C sort -z | xargs -r0  sha512sum | sha512sum

But that doesn't handle empty directories, so here is a more complete command copied from the other answer. It does not use -print0 to reduce complexity, but Windows doesn't permit newlines and such special characters in file/folder names anyway so no big deal.

dir=<mydir>; (find "$dir" -type f -exec md5sum {} +; find "$dir" -type d) | LC_ALL=C sort | md5sum

Lastly, the Synology creates extra files/folders for indexing, so I also had to ignore the index files with -not -path. This was my final command which generated the same checksum in WSL for my Windows folder and in Synology SSH for the copied folder:

dir=.; (find "$dir" -type f -not -path '*@eaDir*' -exec sha512sum {} +; find "$dir" -type d -not -path '*@eaDir*') | LC_ALL=C sort | sha512sum