In Python, is there an async equivalent to multiprocessing or concurrent.futures?

Question

Basically, I'm looking for something that offers a parallel map using python3 coroutines as the backend instead of threads or processes. I believe there should be less overhead when performing highly parallel IO work.

Surely something similar already exists, be it in the standard library or some widely used package?

This [thread](https://mail.python.org/pipermail/python-list/2014-August/thread.html#676571) discusses basically the same idea. Not so much useful information though. — Yaroslav Admin, Oct 12 '15 at 09:56

score 36 · Accepted Answer · edited May 23 '17 at 12:02

DISCLAIMER PEP 0492 defines only syntax and usage for coroutines. They require an event loop to run, which is most likely asyncio's event loop.

Asynchronous map

I don't know any implementation of map based on coroutines. However it's trivial to implement basic map functionality using asyncio.gather():

def async_map(coroutine_func, iterable):
    loop = asyncio.get_event_loop()
    future = asyncio.gather(*(coroutine_func(param) for param in iterable))
    return loop.run_until_complete(future)

This implementation is really simple. It creates a coroutine for each item in the iterable, joins them into single coroutine and executes joined coroutine on event loop.

Provided implementation covers part of the cases. However it has a problem. With long iterable you would probably want to limit amount of coroutines running in parallel. I can't come up with simple implementation, which is efficient and preserves order at the same time, so I will leave it as an exercise for a reader.

Performance

You claimed:

I believe there should be less overhead when performing highly parallel IO work.

It requires proof, so here is a comparison of multiprocessing implementation, gevent implementation by a p and my implementation based on coroutines. All tests were performed on Python 3.5.

Implementation using multiprocessing:

from multiprocessing import Pool
import time


def async_map(f, iterable):
    with Pool(len(iterable)) as p:  # run one process per item to measure overhead only
        return p.map(f, iterable)

def func(val):
    time.sleep(1)
    return val * val

Implementation using gevent:

import gevent
from gevent.pool import Group


def async_map(f, iterable):
    group = Group()
    return group.map(f, iterable)

def func(val):
    gevent.sleep(1)
    return val * val

Implementation using asyncio:

import asyncio


def async_map(f, iterable):
    loop = asyncio.get_event_loop()
    future = asyncio.gather(*(f(param) for param in iterable))
    return loop.run_until_complete(future)

async def func(val):
    await asyncio.sleep(1)
    return val * val

Testing program is usual timeit:

$ python3 -m timeit -s 'from perf.map_mp import async_map, func' -n 1 'async_map(func, list(range(10)))'

Results:

Iterable of 10 items:
- multiprocessing - 1.05 sec
- gevent - 1 sec
- asyncio - 1 sec
Iterable of 100 items:
- multiprocessing - 1.16 sec
- gevent - 1.01 sec
- asyncio - 1.01 sec
Iterable of 500 items:
- multiprocessing - 2.31 sec
- gevent - 1.02 sec
- asyncio - 1.03 sec
Iterable of 5000 items:
- multiprocessing - failed (spawning 5k processes is not so good idea!)
- gevent - 1.12 sec
- asyncio - 1.22 sec
Iterable of 50000 items:
- gevent - 2.2 sec
- asyncio - 3.25 sec

Conclusions

Concurrency based on event loop works faster, when program do mostly I/O, not computations. Keep in mind, that difference will be smaller, when there are less I/O and more computations are involved.

Overhead introduced by spawning processes is significantly bigger, than overhead introduced by event loop based concurrency. It means that your assumption is correct.

Comparing asyncio and gevent we can say, that asyncio has 33-45% bigger overhead. It means that creation of greenlets is cheaper, than creation of coroutines.

As a final conclusion: gevent has better performance, but asyncio is part of the standard library. Difference in performance (absolute numbers) isn't very significant. gevent is quite mature library, while asyncio is relatively new, but it advances quickly.

So I guess nothing prevents adding a CoroutineExecutor to concurrent.futures? Do you agree it would be a good idea? — static_rtti, Oct 12 '15 at 08:07
@static_rtti It would be nice to have it. One small thing, which worries me is that this `CoroutineExecutor` expects coroutine function, while executors based on processes and threads expect normal function. Anyway it's much better to start such discussion in the [IRC](https://www.python.org/community/irc/) channel or [mailing list](https://www.python.org/community/lists/) to get feedback for your idea. I'm not involved in the Python development, so can't advice you much on where to start. — Yaroslav Admin, Oct 12 '15 at 09:31
You don't need to do the iteration on `future = asyncio.gather(*(f(param) for param in iterable))` you can simply use `map`. `future = asyncio.gather(*map(f, iterable))` — styvane, Sep 16 '16 at 10:04
You can use an asyncio.Semaphore to limit the number of concurrent requests: https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html — static_rtti, Nov 23 '16 at 14:00
I think you shouldn't use that code to test multiprocessing, it's unfair. You should use `Pool(CPU_CORE_NUMBER+1)` and `imap_unordered` . — Mithril, Jan 10 '18 at 09:35
I guess [uvloop](https://github.com/MagicStack/uvloop) would improve results for the asyncio test case quite a bit. — Rotareti, Mar 25 '18 at 04:22

a p · Answer 2 · 2015-10-15T07:35:24.967

You could use greenlets (lightweight threads, basically coroutines) for this, or the somewhat higher-level gevent lib built on top of them:

(from the docs)

import gevent
from gevent import getcurrent
from gevent.pool import Group

group = Group()

def hello_from(n):
    print('Size of group %s' % len(group))
    print('Hello from Greenlet %s' % id(getcurrent()))

group.map(hello_from, xrange(3))

def intensive(n):
    gevent.sleep(3 - n)
    return 'task', n

print('Ordered')

ogroup = Group()
for i in ogroup.imap(intensive, xrange(3)):
    print(i)

print('Unordered')

igroup = Group()
for i in igroup.imap_unordered(intensive, xrange(3)):
    print(i)

Yields output:

Size of group 3
Hello from Greenlet 31904464
Size of group 3
Hello from Greenlet 31904944
Size of group 3
Hello from Greenlet 31905904
Ordered
('task', 0)
('task', 1)
('task', 2)
Unordered
('task', 2)
('task', 1)
('task', 0)

The standard constraints of lightweight-vs-proper-multicore-usage apply to greenlets vs threads. That is, they're concurrent but not necessarily parallel.

Quick edit for people who see this in the future, since Yaroslav has done a great job of outlining some differences between Python's asyncio and gevent:

Why gevent over async/await? (these are all super subjective but have applied to me in the past)
- Not portable/easily accesible (not just 2.X, but 3.5 brought new keywords)
- async and await have a tendency to spread and infect codebases - when someone else has encapsulated this for you, it's super duper nice in terms of development and readability/maintainability
- In addition to above, I (personally) feel like the high-level interface of gevent is very "pythonic".
- Less rope to hang yourself with. In simple examples the two seem similar, but the more you want to do with async calls, the more chance you have to fuck up something basic and create race conditions, locks, unexpected behaviors. No need to reinvent the noose imho.
- Gevent's performance scales past trivial examples and is used and tested in lots of production environments. If you don't know much about asynchronous programming, it's a good place to start.

Why asyncio and not Gevent?
- If you can guarantee a version of Python and don't have access to 3rd party packages/pip, it gives you out of the box support.
- Similar to above, if you don't want to be tied in to a project that's been slow to adopt Py3k, rolling your own small toolset is a good option.
- If you want to fine tune things, you're in charge!

Thanks! Isn't gevent somewhat deprecated now that coroutines are integrated in the main python distribution? Or do they work well together? — static_rtti, Oct 08 '15 at 15:35
It isn't deprecated (as far as I know!) but it's not Guido's Favorite Way either, so what you heard may have been more to do with that. Still, it's performant, well-tested, easy as pie, and supports 3.3+ and 2.7+. — a p, Oct 08 '15 at 16:02

In Python, is there an async equivalent to multiprocessing or concurrent.futures?

2 Answers2

Asynchronous map

Performance

Conclusions

Linked