I have a file with a large (0.5-1.5 million) number of lines, each of them is a filename (length is about 50-100 characters). What I need is a fast search through these lines by a given query. Now my code looks like this:
def similarity(haystack, needle):
    words = re.findall(r'\w+', haystack.lower()) # replacing by split with separators reduces time by about 4 seconds
    for word in words:
        if word == needle:
            return 10
    for word in words:
        if word.startswith(needle):
            return 10 ** (len(needle) / len(word))
    if needle in haystack:
        return 1
    return 0
def search(text):
    text = text.lower()
    lines = [(similarity(x, text), x) for x in lines]
    return [x[1] for x in sorted(lines, reverse = True)[:15]]
It runs about 15 seconds on example file at my PC (almost all time is in  similarity() function), and I want it to run almost immediately, in a couple of seconds. How can this be done?
I think that indexing may help, but have no idea about its possible structure. And, if possible, I want the search to be "more fuzzy" - e.g. with N-grams or something like that. But the main concern now is the speed.
UPD:
The same lines are searched through multiple times.
needle is always a single word.
"More fuzzy" means that it should find lines even if needle is a bit mistyped.
 
     
    