Introduction
I have a function func which is vectorizable, and I vectorize it using np.frompyfunc. Rather than using a nested for loop, I want to call it only once, and thus I'll need to pad the inputs with np.newaxis's.
My goal is to get rid of the two nested for loops and use the numpy.array broadcasting feature instead.
Here is the MWE for loops (I want to get rid of the for loops and instead pad the variables c_0, c_1, rn_1, rn_2, and factor when calling myfunc.
MWE of the problem of interest
for i, b_0 in enumerate(b_00):
for j, b_1 in enumerate(b_11):
factor = b_0 + b_1
rn = (b_0 * coord + b_1 * coord2) / factor
rn_1 = coord - rn
rn_2 = coord2 - rn
results = np.add( results,np.prod(myfunc(c_0[:,None,:], c_1[None,:,:], rn_1, rn_2, factor), axis=2) )
The above explicit for loop is correct and is included in the block code for quality assurance.
my current efforts
factor = b_00[:, None] + b_11[None, :]
rn = np.add( (b_00[:,None] * coord[None,:])[:, None, :], (b_11[:,None] * coord2[None,:])[None, :, :] ) / factor[:,:,None]
rn_1 = coord - rn
rn_2 = coord2 - rn
results2 = np.prod(myfunc(c_0[:,None,:, None, None], c_1[None,:,:, None, None], rn_1[:,:,:],rn_2[:,:, :], factor[None,:, :, None]), axis=2)
results2 = np.squeeze(results2)
results2 = np.sum(results2, axis=(2,3))
Block of monolithic code
import numpy as np
myfunc = np.frompyfunc(func,5,1)
################################################################
# Prep arrays needed for MWE
################################################################
results = np.zeros((5,3))
coord = np.array([1,1,2])
coord2 = np.array([3,3,3])
c_0 = np.array([[0,0,2],[0,2,0],[2,0,0],[1,1,0], [1,0,1]])
c_1 = np.array([[0,0,2],[0,2,0],[2,0,0]])
b_00 = np.array([2.2, 1.1, 4.4]) # np.array([2.2, 3.3, 40.4])
b_11 = np.array([1.2, 3.3]) # np.array([1.2, 5.3])
################################################################
# This is only for comparison. `results` is the correct answer
################################################################
for i, b_0 in enumerate(b_00):
for j, b_1 in enumerate(b_11):
factor = b_0 + b_1
rn = (b_0 * coord + b_1 * coord2) / factor
rn_1 = coord - rn
rn_2 = coord2 - rn
results = np.add( results,np.prod(myfunc(c_0[:,None,:], c_1[None,:,:], rn_1, rn_2, factor), axis=2) )
################################################################
# Prep for broadcasting (My attempt)
################################################################
factor = b_00[:, None] + b_11[None, :]
rn = np.add( (b_00[:,None] * coord[None,:])[:, None, :], (b_11[:,None] * coord2[None,:])[None, :, :] ) / factor[:,:,None]
rn_1 = coord - rn
rn_2 = coord2 - rn
# The following all get the same *wrong* answer
# results2 = np.prod(myfunc(c_0[:,None,:,None, None], c_1[None,:,:, None, None], rn_1[:, None, None],rn_2[:,None, None], factor[None,:, :]), axis=2)
# results2 = np.prod(myfunc(c_0[:,None,:, None, None, None], c_1[None,:,:, None, None, None], rn_1[None, None,:,:,:, None],rn_2[None, None,:,:, :, None], factor[None,:, :, None, None]), axis=2)
# results2 = np.prod(myfunc(c_0[:,None,:, None, None], c_1[None,:,:, None, None], rn_1[None, None,:,:,:],rn_2[None, None,:,:, :], factor[None,:, :, None]), axis=2)
# this should be the only line needing work!
results2 = np.prod(myfunc(c_0[:,None,:, None, None], c_1[None,:,:, None, None], rn_1[:,:,:],rn_2[:,:, :], factor[None,:, :, None]), axis=2)
results2 = np.squeeze(results2)
results2 = np.sum(results2, axis=(2,3))
assert np.allclose(results, results2)
# Vectorized function to be sent broadcasted arrays
def func(r0, r1, x, y, at):
val = 0.0
for i in range(r0+1):
for j in range(r1+1):
val += x + i*j + at * y
return val
Problems
- With the code above I get the right shape of the result array (
results2is my attempt at broadcasting, andresultsis the slow for loop that gives the correct answer), but it has the wrong values. - As
@hpauljpointed out, if I change the dimensions ofb_00to be length 4 (or anything larger as well), my solution ceases to even get the correct shape.
Update
Please make sure that it works for both the current b_00 = np.array([2.2, 1.1, 4.4]) as well as a more general b_00 = np.array([2.2, 1.1, 4.4, 5.1, 6.2]). I would like a broadcasted solution, but will accept a solution that is simply faster than the for loops.