In the following code, I time how long it takes to pass a large array (8 MB) to a child process using the args key word when forking the process verses passing using a pipe.  
Does anyone have any insight into why it is so much faster to pass data using an argument than using a pipe?
Below, each code block is a cell in a Jupyter notebook.
import multiprocessing as mp
import random
N = 2**20
x = list(map(lambda x : random.random(),range(N)))
Time the call to sum in the parent process (for comparison only): 
%%timeit -n 5 -r 10 -p 8 -o -q
pass
y = sum(x)/N
t_sum = _
Time the result of calling sum from a child process, using the args keyword to pass list x to child process.
def mean(x,q):
    q.put(sum(x))
%%timeit -n 5 -r 10 -p 8 -o -q
pass
q = mp.Queue()
p = mp.Process(target=mean,args=(x,q))
p.start()
p.join()
s = q.get()
m = s/N
t_mean = _
Time using a pipe to pass data to child process
def mean_pipe(cp,q):
    x = cp.recv()
    q.put(sum(x))
%%timeit -n 5 -r 10 -p 8 -o -q
pass
q = mp.Queue()
pipe0,pipe1 = mp.Pipe()
p = mp.Process(target=mean_pipe,args=[pipe0,q])
p.start()
pipe1.send(x)
p.join()
s = q.get()
m = s/N
t_mean_pipe = _
(ADDED in response to comment)  Use mp.Array shared memory feature (very slow!)
def mean_pipe_shared(xs,q):
    q.put(sum(xs))
%%timeit -n 5 -r 10 -p 8 -o -q
xs = mp.Array('d',x)
q = mp.Queue()
p = mp.Process(target=mean_pipe_shared,args=[xs,q])
p.start()
p.join()
s = q.get()
m = s/N
t_mean_shared = _
Print out results (ms)
print("{:>20s} {:12.4f}".format("MB",8*N/1024**2))
print("{:>20s} {:12.4f}".format("mean (main)",1000*t_sum.best))
print("{:>20s} {:12.4f}".format("mean (args)",1000*t_mean.best))
print("{:>20s} {:12.4f}".format("mean (pipe)",1000*t_mean_pipe.best))
print("{:>20s} {:12.4f}".format("mean (shared)",1000*t_mean_shared.best))         
              MB       8.0000
     mean (main)       7.1931
     mean (args)      38.5217
     mean (pipe)     136.5020
   mean (shared)    4195.0568
Using the pipe is over 3 times slower than passing arguments to the child process.  And unless I am doing something very wrong, mp.Array is a non-starter. 
Why is the pipe so much slower than passing directly to the subprocess (using args)?  And what's up with the shared memory?
