There are two sparse files. They are proved identical by diff. But it took 20 minutes (too long time) to compare. I am thinking of taring them into tiny files to speed up the comparison. But they tar into different outputs.
They are 512GB huge sparse files, with only around 40K meaningful data.
% ls -l sparse_file_one/
total 40
-rw-r--r-- 1 midnite midnite 512711720960 Mar 4 23:12 sdd.img
% ls -l sparse_file_two/
total 48
-rw-r--r-- 1 midnite midnite 512711720960 Mar 4 23:13 sdd.img
% du sparse_file_one/sdd.img
40 sparse_file_one/sdd.img
% du sparse_file_two/sdd.img
48 sparse_file_two/sdd.img
diff comparison takes 20 minutes. They are proved identical.
% diff -qs --speed-large-files sparse_file_one/sdd.img sparse_file_two/sdd.img | pv
68.0 B 0:20:57 [55.4miB/s] [ <=> ]
Files sparse_file_one/sdd.img and sparse_file_two/sdd.img are identical
As their du disk usages differ, I look into filefrag and confirm that their internal representations differ.
% filefrag -v sparse_file_one/sdd.img
Filesystem type is: ef53
File size of sparse_file_one/sdd.img is 512711720960 (125173760 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 0: 6866944.. 6866944: 1:
1: 8192.. 8194: 6852608.. 6852610: 3: 6875136:
2: 12288.. 12288: 6854656.. 6854656: 1: 6856704:
3: 16384.. 16384: 6868992.. 6868992: 1: 6858752:
4: 16448.. 16449: 6869056.. 6869057: 2:
5: 16512.. 16512: 6869120.. 6869120: 1: last
sparse_file_one/sdd.img: 4 extents found
% filefrag -v sparse_file_two/sdd.img
Filesystem type is: ef53
File size of sparse_file_two/sdd.img is 512711720960 (125173760 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 0: 6871040.. 6871040: 1:
1: 8192.. 8195: 6856704.. 6856707: 4: 6879232:
2: 12288.. 12288: 6858752.. 6858752: 1: 6860800:
3: 16384.. 16384: 6860800.. 6860800: 1: 6862848:
4: 16448.. 16449: 6860864.. 6860865: 2:
5: 16512.. 16512: 6860928.. 6860928: 1:
6: 125173759..125173759: 132128862.. 132128862: 1: 132018175: last,eof
sparse_file_two/sdd.img: 5 extents found
tar completes promptly. It takes literally no time. But the tar output sizes differ. No wonder they will not be compared identical.
% cd ../sparse_file_one/
sparse_file_one % tar -cvSf sdd.img.tar --mtime=@0 sdd.img | pv
tar: Option --mtime: Treating date '@0' as 1970-01-01 08:00:00
sdd.img
8.00 B 0:00:00 [26.2KiB/s] [ <=> ]
sparse_file_one % ls -l
total 80
-rw-r--r-- 1 midnite midnite 512711720960 Mar 4 23:12 sdd.img
-rw-r--r-- 1 midnite midnite 40960 Mar 5 00:22 sdd.img.tar
% cd ../sparse_file_two
sparse_file_two % tar -cvSf sdd.img.tar --mtime=@0 sdd.img | pv
tar: Option --mtime: Treating date '@0' as 1970-01-01 08:00:00
sdd.img
8.00 B 0:00:00 [ 520KiB/s] [ <=> ]
sparse_file_two % ls -l
total 100
-rw-r--r-- 1 midnite midnite 512711720960 Mar 4 23:13 sdd.img
-rw-r--r-- 1 midnite midnite 51200 Mar 5 00:23 sdd.img.tar
(With reference to this post, nullifying the mtime makes identical tar archives. I could make identical archives from other identical sparse or non-sparse files. But this behaviour is apparently not guaranteed.)
(According to this post, if I could extract the content of a sparse file with less than 10 minutes, it would be faster to verify they are identical. But I do not know python. It would be nice if certain Linux native program could do it.)
PS - I would prefer using diff to cmp for the directory recursive comparison possibility.