I'm trying to speed up some code that calls an api_caller(), which is a generator that you can iterate over to get results.
My synchronous code looks something like this:
def process_comment_tree(p):
    # time consuming breadth first search that makes another api call...
    return
def process_post(p):
    process_comment_tree(p)
def process_posts(kw):
    for p in api_caller(query=kw): #possibly 1000s of results
        process_post(p)
    
def process_kws(kws):
    for kw in kws:
        process_posts(kw)
process_kws(kws=['python', 'threads', 'music'])
When I run this code on a long list of kws, it takes around 18 minutes to complete.
When I use threads:
with concurrent.futures.ThreadPoolExecutor(max_workers=len(KWS)) as pool:
    for result in pool.map(process_posts, ['python', 'threads', 'music']):
        print(f'result: {result}')
the code completes in around 3 minutes.
Now, I'm trying to use Trio for the first time, but I'm having trouble.
async def process_comment_tree(p):
    # same as before...
    return
async def process_post(p):
    await process_comment_tree(p)
async def process_posts(kw):
    async with trio.open_nursery() as nursery:
        for p in r.api.search_submissions(query=kw)
            nursery.start_soon(process_post, p)
    
async def process_kws(kws):
    async with trio.open_nursery() as nursery:
        for kw in kws:
            nursery.start_soon(process_posts, kw)
trio.run(process_kws, ['python', 'threads', 'music'])
This still takes around 18 minutes to execute. Am I doing something wrong here, or is something like trio/async not appropriate for my problem setup?
 
    