From How do I calculate the MD5 checksum of a file in Python? , I wrote a script to remove the duplicate files in the folder dst_dir with md5. However, for many files(.jpg and .mp4), the md5 was not able to remove the duplicate files. I checked that the methods mentioned in Python 3 same text but different md5 hashes did not work. I suspect if might be the property file(the "modification date" etc.) that's attached to the image files that's changed.
import os
dst_dir="/"
import hashlib
directory=dst_dir;
#list of file md5
md5_list=[];
md5_file_list=[];
for root, subdirectories, files in os.walk(directory):
    
    if ".tresorit" not in root:
        for file in files:
            file_path =os.path.abspath( os.path.join(root,file) );
            print(file_path)
            # Open,close, read file and calculate MD5 on its contents 
            with open(file_path, 'rb') as file_to_check:
                # read contents of the file
                data = file_to_check.read()    
                # pipe contents of the file through
                md5_returned = hashlib.md5(data).hexdigest()
            if md5_returned not in md5_list:
                md5_list.append(md5_returned);
                md5_file_list.append(file_path);
                
            else:
                # remove duplicate file 
                
                print(["Duplicate file", file_path, md5_returned] )
                if "-" not in file:
                    os.remove(file_path);
                    print("Duplicate file removed 01")
                else:
                    file_list_index=md5_list.index(md5_returned);
                    
                    if "-" not in md5_file_list[file_list_index]:
                        os.remove(md5_file_list[file_list_index]);
                        
                        del md5_list[file_list_index]
                        del md5_file_list[file_list_index]
                        print("Duplicate file removed 02")
                        
                        md5_list.append(md5_returned)
                        md5_file_list.append(file_path)
                    else:
                        os.remove(file_path);
                        print("Duplicate file removed 03")
How to fix Python md5 calculation such that the same image files could be returned with the same md5 values?
