192

I sometimes get files from my clients that have the wrong file extension. For example, the name is image.jpg but the file is actually a TIFF image. In many cases I can clarify it by opening the file in a text editor, looking at the first few bytes, then deducing which file type it is.

This works for me with JPEG, TIFF, GIF and PDF files. However there are many more file types out there.

Is it possible to automate identification of the correct file type by analyzing the containing data?

Stevoisiak
  • 16,075
Martin
  • 4,012

7 Answers7

183

You can use the TrID tool which has a growing library of file type definitions for identifying files with.

Screenshot

Wildcards are supported, so in your example you could just put all the images to be examined in a folder, e.g. C:\verifyimages - then you can use the command:

trid C:\verifyimages\*

This will examine all files in the verifyimages folder.


There is also a GUI version available, TrIDNet:

Screenshot

There is documentation available on how you can you can easily integrate TrID or TrIDNet into Windows Explorer and Total Commander:

Windows Explorer

Total Commander

Gareth
  • 19,080
58

file

File tests each argument in an attempt to classify it. There are three sets of tests, performed in this order: filesystem tests, magic number tests, and language tests. The first test that succeeds causes the file type to be printed.

The type printed will usually contain one of the words text (the file contains only printing characters and a few common control characters and is probably safe to read on an ASCII terminal), executable (the file contains the result of compiling a program in a form understandable to some UNIX kernel or another), or data meaning anything else (data is usually “binary” or non-printable). Exceptions are well-known file formats (core files, tar archives) that are known to contain binary data.

16

I used to work for the French National Library, to build an digital archive system that contains not only digitized books but also millions of digital artefacts with all kinds of strange file types. We used JHOVE to recognize file formats.

JHOVE is open source, it is maintained by JSTOR and the Harvard University Library. It is rather simple to use.

Nicolas Raoul
  • 11,561
13

A modern approach that may appeal is to use Git for Windows. Run git-bash.exe and run the command file path\to\file. An example output might be:

TestFile.ico: MS Windows icon resource - 1 icon, 128x128, 32 bits/pixel

Alternatively, use the command file -i path\to\file, which might give:

TestFile.ico: image/vnd.microsoft.icon; charset=binary
AlainD
  • 5,158
3

You can check the file type from any computer including windows at

http://www.checkfiletype.com

2

I use Oracle's OutsideIn libraries in my programs. Not free, but they work well, especially for images. The market-speak says it supports over 500 file types.

0

file from https://darwinsys.com/file/ can do it.

But to get the precompiled binary of it, you can use Git for Windows as answered by AlainD or MSYS2 (which is the environment used by Git for Windows) without additional configuration or installation.

Git for Windows has a portable version which you can use without installation. So, I prefer it over MSYS2.