I'd like to improve on @Fred Foo answer, by providing a modified version of his script, which differs in that it does not store the files and directories in the repository as a side effect of computing their hashes: http://pastebin.com/BSNGqsqC
Unfortunately I am not aware of any way to force git mktree to not create a tree object in the repository, so the code has to generate a binary representation of the tree and pass it to git hash-object -t tree.
This script is based also on answers from What is the internal format of a git tree object?
The general idea is to use git hash-object -- data.txt to get hash of a file, and to use git hash-object --stdin -t tree < TreeDescription for a directory, where:
- TreeDescription is a concatenation of
"mode name\0hash"
mode is "100644" for files, and "40000" for directories (note the lack of leading zero in case of directory)
mode and name are separated by a single space,
name and hash are separated by a single byte \0
hash is a 20-bytes long binary representation of object hash
- entries are sorted by
name, which seems not entirely necessary to create a tree object, but helps to determine if two directories are equivalent by comparing their hashes - unfortunately I am not aware which sorting algorithm should be used here (in particular: what to do in case of non-ascii characters)
Also note that this binary format differs a little bit from the way a tree object is stored in the repository in that it lacks the "tree SIZE\0" header.
Obviously you have to compute this bottom-up, starting from deepest files, as you need hashes of all children before computing the hash of a parent.