I have two lists of objects. Let's call the lists a and b. The objects (for our intents and purposes) are defined as below:
class MyObj:
    def __init__(self, string: str, integer: int):
        self.string = string
        self.integer = integer
    def __eq__(self, other):
        if self.integer == other.integer:
            pass
        else:
            return False
        if fuzz.ratio(self.string, other.string) > 90: # fuzzywuzzy library checks if strings are "similar enough"
            return True
        else:
            return False
Now what I want to achieve is to check which objects in list a are "in" list b (return true against == when compared to some object in list b).
Currently I'm just looping through them as follows:
for obj in a:
    for other_obj in b:
        if a == b:
            <do something>
            break
I strongly suspect that there is a faster way of implementing this. The lists are long. Up to like 100 000 objects each. So this is a big bottleneck in my code.
I looked at this answer Fastest way to search a list in python and it suggests that sets work much better. I'm a bit confused by this though:
- How significant is the "removal of duplicates" speedup? I don't expect to have many duplicates in my lists. 
- Can sets remove duplicates and properly hash when I have defined the eq the way I have? 
- How would this compare with pre-ordering the list, and using something like binary search? A set is unordered...
So what is the best approach here? Please provide implementation guidelines in the answer as well.
 
     
    