I am curious why removing a line in my code results in a significant increase in performance. The function itself takes a dictionary and removes all keys which are substrings of other keys.
The line which slows my code down is:
if sub in reduced_dict and sub2 in reduced_dict:
Here's my function:
def reduced(dictionary):
    reduced_dict = dictionary.copy()
    len_dict = defaultdict(list)
    for key in dictionary:
        len_dict[len(key)].append(key)
    start_time = time.time()
    for key, subs in len_dict.items():
        for key2, subs2 in len_dict.items():
            if key2 > key:
                for sub in subs:
                    for sub2 in subs2:
                        if sub in reduced_dict and sub2 in reduced_dict: # Removing this line gives a significant performance boost
                            if sub in sub2:
                                reduced_dict.pop(sub, 0)
    print time.time() - start_time
    return reduced_dict
The function checks if sub is in sub2 many times. I assumed that if I checked for this comparison having already been made, I would be saving myself time. This doesn't seem to be the case. Why is the constant time function for lookup in a dictionary slowing me down?
I am a beginner so, I'm interested in concepts.
When I tested if the line in question is ever returning False, it appears that it is. I've tested this with the following
def reduced(dictionary):
    reduced_dict = dictionary.copy()
    len_dict = defaultdict(list)
    for key in dictionary:
        len_dict[len(key)].append(key)
    start_time = time.time()
    for key, subs in len_dict.items():
        for key2, subs2 in len_dict.items():
            if key2 > key:
                for sub in subs:
                    for sub2 in subs2:
                        if sub not in reduced_dict or sub2 not in reduced_dict:
                            print 'not present' # This line prints many thousands of times
                        if sub in sub2:
                            reduced_dict.pop(sub, 0)
    print time.time() - start_time
    return reduced_dict
For 14,805 keys in the function's input dictionary:
- 19.6360001564 sec. without the line
 - 33.1449999809 sec. with the line
 
Here are 3 dictionary examples. Biggest sample dictionary with 14805 keys, medium sample dictionary and smaller sample dictionary
I have graphed time in seconds (Y) vs input size in # of keys (X) for the first 14,000 keys in the biggest example dictionary. It appears all these functions have exponential complexity.
- John Zwinck answer for this question
 - Matt my algorithm for this question without the dictionary comparision
 - Matt exponential is from my first attempt at this problem. This took 76s
 - Matt compare is the algorithm in this question with the dict comparison line
 - tdelaney solution for this question. Algorithm 1 & 2 in order
 - georg solution from a related question I asked
 

The accepted answer executes in apparently linear time.

I'm surprised to find magic ratio exists for input size where run time for a dict look-up == a string search.