How to compare mp3, flac audio data in a file, ignoring header data (ID3 tag) etc.?

Question

I've backed up some audio files up in 2 places and added ID3 tags into one backup but not the other, since time has passed my own memory has faded on whether the backups are actually the same, but now one has ID3 data and the other doesn't, basic binary compare will fail and inspection will be cumbersome.

Is there a tool to compare just the audio data (not the header, ID3) in mp3s, flac files, and other files using header data such as ID3.

started a thread on beyond compare here: http://www.scootersoftware.com/vbulletin/showthread.php?t=7413

would consider other comparison software that does this task

score 9 · Accepted Answer · edited Nov 11 '16 at 15:48

Ah, the eternal plight. I myself struggled with this very question for so long and tried so many duplicate-file-finding apps that I eventually gave up and decided to write one myself. And then I found AllDup.

AllDup made me indefinitely back-burner my own project because it is a fast DFF that has the ability to compare MP3 and JPEG files, ignoring their ID3 tags and Exif data respectively. Even better, Michael Thummerer is very responsive to feedback and is quick to fix bugs and implement suggestions (you can suggest ignoring FLAC headers). To top it all off, AllDup is free.

score 7 · Answer 2 · edited Mar 07 '23 at 02:56

Here's a way to do it at the shell. You need avconv, which in Debian/Ubuntu is in libav-tools.

$ avconv -i INPUT_FILE -c:a copy -f crc - 2>/dev/null | grep CRC

You'll get a line like this:

CRC=0xabfdfe10

This will compare every frame of audio data and generate a CRC for it. So a command like this can compare multiple files:

for f in *.mp3 ; do echo -n "$f: "; avconv -i "$f" -f crc - 2>/dev/null | grep CRC; done

Since avconv is now dead (there is nice summary of ffmpeg and avconv at stackoverflow), here's the ffmpeg version:

for f in *.mp3 ; do echo -n "$f: "; ffmpeg -nostdin -i "$f" -c:a copy -f crc - 2>/dev/null ; done

If you don't want to use the basic CRC but wish for something else, use -f md5 for MD5 and -f hash for SHA256. This will display all the available hashing formats:

ffmpeg -formats 2>/dev/null | grep testing

score 4 · Answer 3 · answered Feb 22 '11 at 15:10

4

Foobar2000 with the Binary Comparator plugin will do this.

answered Feb 22 '11 at 15:10

afrazier

23,505

score 3 · Answer 4 · answered Jun 14 '17 at 22:24

As possible solution you may use any tool to convert file into uncompressed stream (pcm, wav) without metadata info and then compare it. For conversion you may use any software you have like ffmpeg, sox or avidemux.

For example how I do that with ffmpeg

Say I have for that example 2 files with different metadata: $ diff Original.mp3 Possible-dup.mp3 ; echo $? Binary files Original.mp3 and Possible-dup.mp3 differ Brute force comparison complain they are differ.

Then we just convert and diff body: $ diff <( ffmpeg -loglevel 8 -i Original.mp3 -map_metadata -1 -f wav - ) <( ffmpeg -loglevel 8 -i Possible-dup.mp3 -map_metadata -1 -f wav - ) ; echo $? 0

Off course ; echo $? part is just for demonstration purpose to see return code.

Processing multiple files (traverse directories)

If you want try duplicates in collection it have worth to calculate checksums (any like crc, md5, sha2, sha256) of data and then just find there collisions.

Although it is out of scope of that question I would suggest some simple suggestions how to find duplicates of files in directory accounting only it contents without metadata consideration.

First calculate hash of data in each file (and place into file for next processing): for file in *.mp3; do printf "%s:%s\n" "$( ffmpeg -loglevel 8 -i "$file" -map_metadata -1 -f wav - | sha256sum | cut -d' ' -f1 )" "$file"; done > mp3data.hashes File will be looks like: $ cat mp3data.hashes ad48913a11de29ad4639253f2f06d8480b73d48a5f1d0aaa24271c0ba3998d02:file1.mp3 54320b708cea0771a8cf71fac24196a070836376dd83eedd619f247c2ece7480:file2.mp3 1d8627a21bdbf74cc5c7bc9451f7db264c167f7df4cbad7d8db80bc2f347110f:Original.mp3 8918674499b90ace36bcfb94d0d8ca1bc9f8bb391b166f899779b373905ddbc1:Other-dup.mp3 8918674499b90ace36bcfb94d0d8ca1bc9f8bb391b166f899779b373905ddbc1:Other.mp3 1d8627a21bdbf74cc5c7bc9451f7db264c167f7df4cbad7d8db80bc2f347110f:Possible-dup.mp3 Any RDBMS will be very helpful there to aggregate count and select such data. But continue pure command-line solution you may want do simple steps like further.

See duplicates hashes if any (extra step to show how it works, does not needed for find dupes): $ count.by.regexp.awk '([0-9a-f]+):' mp3data.hashes [1:54320b708cea0771a8cf71fac24196a070836376dd83eedd619f247c2ece7480]=1 [1:1d8627a21bdbf74cc5c7bc9451f7db264c167f7df4cbad7d8db80bc2f347110f]=2 [1:ad48913a11de29ad4639253f2f06d8480b73d48a5f1d0aaa24271c0ba3998d02]=1

And all together to list files duplicated by content: $ grep mp3data.hashes -f <( count.by.regexp.awk '([0-9a-f]+):' mp3data.hashes | grep -oP '(?<=\[1:).{64}(?!]=1$)' ) | sort 1d8627a21bdbf74cc5c7bc9451f7db264c167f7df4cbad7d8db80bc2f347110f:Original.mp3 1d8627a21bdbf74cc5c7bc9451f7db264c167f7df4cbad7d8db80bc2f347110f:Possible-dup.mp3 8918674499b90ace36bcfb94d0d8ca1bc9f8bb391b166f899779b373905ddbc1:Other-dup.mp3 8918674499b90ace36bcfb94d0d8ca1bc9f8bb391b166f899779b373905ddbc1:Other.mp3

count.by.regexp.awk is simple awk script to count regexp patterns.

score 1 · Answer 5 · answered Feb 22 '11 at 13:51

I also asked this on the Beyond Compare forum, as mentioned in the question - and Beyond Compare does also provide a solution:

http://www.scootersoftware.com/vbulletin/showthread.php?t=7413

Both approaches are worth considering:

the AllDup solution is best if you don't care about which copies of the files are preserved and which are discarded in a directory folder tree AND you have a mix of tagged and non-tagged files in the same folders that you want to run the duplicate check on.
Beyond Compare is best if you want to retain the diectory/folder tree AND are compare 2 separate folder/directory structures, helped also by using the on-the-fly non-destructive flatten-tree option

score 0 · Answer 6 · answered Jul 24 '21 at 21:57

0

You can do it with ffmpeg :

ffmpeg -i "$YourMP3File" -vn -c:a copy -f md5 -

For a whole directory, you could do something like this in a Unix shell like Bash:

for f in *.mp3; do md5=$(ffmpeg -i "$f" -vn -c:a copy -f md5 -); echo "$md5  $f"; done

answered Jul 24 '21 at 21:57

mivk

4,015

How to compare mp3, flac audio data in a file, ignoring header data (ID3 tag) etc.?

6 Answers6

For example how I do that with ffmpeg

Processing multiple files (traverse directories)

Linked

Related