1

I am looking for a program, that should be source portable across Windows and Linux ( for eg. ANSI C, etc ) that generates cryptographic hashes like MD5, SHA, etc as well as CRC32 of a file/list of files passed to it.

I will be running this executable on terabytes of files, generating their SHA, MD5 and CRC32 ( and more in the future ) signatures, so speed is important.

What I had at mind is exactly what ReHash is ReHash, ReHash.

Unfortunately, I saw that many users have complained that there are errors in the implementation of the hashes as well as errors in the way padding ( for block based algorithms ) had been implemented.

Being no expert in cryptography, and just looking for a black box solution that simply gives me the hashes that I want without requiring me to, more than, compile some code, I was wondering if there is nothing better?

I could, of course, write a glue program in Python that would use the crypto modules to generate what I want, but I would have preferred a compiled binary from a language like C.

I will be running this executable on terabytes of files, generating their SHA, MD5 and CRC32 ( and more in the future ) signatures and handling all this from Python code, so something compatible with Python would be preferred, but not at the expense of C like speed.

PoorLuzer
  • 628

2 Answers2

2

The python crypto operations are implemented in native code (compiled from C). Since you want the values in a python program, using them will be simpler.

Linux comes with utilities for checksum calculations (cksum, md5sum, sha1sum, ...). So do most other unices. There are several Windows ports of the GNU utilities (which is what you get on Linux): Cygwin, Gnuwin32, Msys, ... You'll need recent enough utilities if you want to get SHA-256 and SHA-512.

There are several ANSI C implementations with very liberal licenses of various cryptographic algorithms floating around, often not collected in a single distribution. You could search for them and test them on small input to check their reliability.

Under Windows, make sure you're correctly treating files as binary or text as desired, since the checksums are defined over byte streams, not line streams. (Normally you'd want to open the files in binary mode, but if you have a text file that got transcoded to Windows line endings, you'll need to open it as text to reverse the effect.) Under any OS, make sure you don't do any encoding translation when opening the file.

Since speed is very important to you, gather all the implementations you can find and benchmark them on moderate-size input (a few megabytes). Different implementations may give better speed on different architectures. 64-bit implementations are likely to be faster where you can run them at all.

2

OpenSSL has tools that compute hashes. The cygwin (http://www.cygwin.com/) project has openssl tools. Though it will be a bit slower than a pure Windows app because of the cygwin layer, you'll also get an environment where you can script your hash generation.

Rich Homolka
  • 32,350