I have a lot of strings that i want to match for similarity(each string is 30 characters on average). I found difflib's SequenceMatcher great for this task as it was simple and found the results good. But if i compare hellboy and hell-boy like this
>>> sm=SequenceMatcher(lambda x:x=='-','hellboy','hell-boy')
>>> sm.ratio()
0: 0.93333333333333335
I want such words to give a 100 percent match i.e ratio of 1.0. I understand that the junk character specified in the function above are not used for comparison but finding longest contiguous matching subsequence. Is there some way i can make SequenceMatcher to ignore some "junk" characters for comparison purpose?
 
     
     
     
    