I want to write a function that efficiently returns a list of all possible substrings of a string according to a minimum and maximum length of the substrings. (The strings contain uppercase letters only.)
For example, for a String 'THISISASTRING', for min_length=3 and max_length=4, it should return:
['THI', 'THIS', 'HIS', 'HISI', 'ISI', 'ISIS', 'SIS', 'SISA', 'ISA',
 'ISAS', 'SAS', 'SAST', 'AST', 'ASTR', 'STR', 'STRI', 'TRI', 'TRIN',
 'RIN', 'RING', 'ING']
I'm looking for a solution that is much faster than my current one:
import cProfile
random_english_text = \
    'AHOUSEISABUILDINGTHATISMADEFORPEOPLETOLIVEINITISAPERMANENTBUILDINGTHATISMEANTTOSTAYSTANDINGITISNOTEASILYPACKEDU' \
    'PANDCARRIEDAWAYLIKEATENTORMOVEDLIKEACARAVANIFPEOPLELIVEINTHESAMEHOUSEFORMORETHANASHORTSTAYTHENTHEYCALLITTHEIRHO' \
    'MEBEINGWITHOUTAHOMEISCALLEDHOMELESSNESSHOUSESCOMEINMANYDIFFERENTSHAPESANDSIZESTHEYMAYBEASSMALLASJUSTONEROOMORTH' \
    'EYMAYHAVEHUNDREDSOFROOMSTHEYALSOAREMADEMANYDIFFERENTSHAPESANDMAYHAVEJUSTONELEVELORSEVERALDIFFERENTLEVELSAHOUSEI' \
    'SSOMETIMESJOINEDTOOTHERHOUSESATTHESIDESTOMAKEATERRACEORROWHOUSEACONNECTEDROWOFHOUSES'
def assemble_substrings(textstring, length_min, length_max):
    str_len = len(textstring)
    subStringList = []
    idx = 0
    while idx <= str_len - length_min:
        max_depth = min(length_max, str_len - idx)
        for i in list(range(length_min, max_depth + 1)):
            subString = textstring[idx:idx + i]
            subStringList.append(subString)
        idx += 1
    return subStringList
pr = cProfile.Profile()
pr.enable()
for i in range(0, 1000):
    list_of_substrings = assemble_substrings(textstring=random_english_text, length_min=4, length_max=10)
pr.disable()
pr.print_stats(sort='cumtime')
which yields me:
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    1.332    0.001    1.672    0.002 <input>:11(assemble_substrings)
  3654000    0.227    0.000    0.227    0.000 {method 'append' of 'list' objects}
   525000    0.112    0.000    0.112    0.000 {built-in method builtins.min}
     1000    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
Now, from the output of the profiler, I don't get much insight on how to speed this function up.
What is the best way to make this function as fast as possible? Should I use a different data structure than a list? Use Cython? Or write this code in an external C/C++ shared object?
Would be highly appreciative for inputs, also generally on how to efficiently deal with strings and operations similar to the one above on them in Python.
 
     
     
     
    