I have data, and each entry needs to be an instance of a class. I'm expecting to encounter many duplicate entries in my data. I essentially want to end up with a set of all the unique entries (ie discard any duplicates). However, instantiating the whole lot and putting them into a set after the fact is not optimal because...
- I have many entries,
- the proportion of duplicated entries is expected to be rather high,
- my __init__()method is doing quite a lot of costly computation for each unique entry, so I want to avoid redoing these computations unnecessarily.
I recognize that this is basically the same question asked here but...
- the accepted answer doesn't actually solve the problem. If you make - __new__()return an existing instance, it doesn't technically make a new instance, but it still calls- __init__()which then redoes all the work you've already done, which makes overriding- __new__()completely pointless. (This is easily demonstrated by inserting- printstatements inside- __new__()and- __init__()so you can see when they run.)
- the other answer requires calling a class method instead of calling the class itself when you want a new instance (eg: - x = MyClass.make_new()instead of- x = MyClass()). This works, but it isn't ideal IMHO since it is not the normal way one would think to make a new instance.
Can __new__() be overridden so that it will return an existing entity without running __init__() on it again?  If this isn't possible, is there maybe another way to go about this?
 
    