WHOLE PROCESS:
your goal is to detect (and perhaps store information about) duplicate files.
1 Then, first, you have to iterate through directories and files,
see this:
list all files from directories and subdirectories in Java
2 and for each file, to load it like a byte array
see this:
Reading a binary input stream into a single byte array in Java
3 then compute your MD5 - your project
4 and store this information
Your can use a Set to dectect duplicates (a Set has unique elements).
Set<String> files_hash; // each String is a string representation of MD5
if (files_hash.contains(my_md5)) // you know you have it already
or a
Map<String,String> file_and_hash; // each is file => hash
// you have to iterate to know if you have it already, or keep also a Set
ANSWER for MD5:
read algorithm:
https://en.wikipedia.org/wiki/MD5
RFC: https://www.ietf.org/rfc/rfc1321.txt
some googling ...
this presentation, step by step
http://infohost.nmt.edu/~sfs/Students/HarleyKozushko/Presentations/MD5.pdf
or try to duplicate C (or java) implementation ...
OVERALL STRATEGY
To keep time and have processus faster, you must also think about the use of your function:
- if you use it once, for one unique file, better is to reduce work, by selecting before other files on their size. 
- if you use it regularly (and want to do it fast), scan regularly new files in background to keep an hash base up to date. Detection of new file is straightforward. 
- if you want to get all files duplicated, better scan everything, and use Set Strategy also 
Hope this helps