I want to execute web scraping with a set of categories, and each category also has a list of URLs. So I decided to call a function based only on each category in the main function, and within the inner function there is a non-blocking call.
So here is the code:
def main():
loop = asyncio.get_event_loop()
b = loop.create_task(f("p", all_p_list))
f = loop.create_task(f("f", all_f_list))
loop.run_until_complete(asyncio.gather(p, f))
It should execute the f function concurrently.
But the f function also has to run the loop, since in the function it calls a function simultaneously, based on each URL.
async def f(category, total):
urls = [urls_template[category].format(t) for t in t_list]
soups_coro = map(parseURL_async, urls)
loop = asyncio.get_event_loop()
result = await loop.run_until_complete(asyncio.gather(*soups_coro))
But after I run the script, it got an This event loop is already running error, and I found that it is because I call loop.run_until_complete() in both inner and outer functions.
However, when I strip the run_until_complete(), and just call f() in the main(), the function call immediately got finished and it cannot wait for the inner function to finish. So it is inevitable to call the loop in the main(). But then I think it is incompatible with the inner function, which also must call it.
How can I deal with the problem and run the loop? The orinigal code is all in the same main() and it worked, but I want to make it cleaner if possible.