A broadcasting issue involving where to put the padding

Question

Introduction

I have a function func which is vectorizable, and I vectorize it using np.frompyfunc. Rather than using a nested for loop, I want to call it only once, and thus I'll need to pad the inputs with np.newaxis's.

My goal is to get rid of the two nested for loops and use the numpy.array broadcasting feature instead.

Here is the MWE for loops (I want to get rid of the for loops and instead pad the variables c_0, c_1, rn_1, rn_2, and factor when calling myfunc.

MWE of the problem of interest

for i, b_0 in enumerate(b_00):
    for j, b_1 in enumerate(b_11):
        factor = b_0 + b_1
        rn = (b_0 * coord + b_1 * coord2) / factor
        rn_1 = coord - rn
        rn_2 = coord2 - rn
        results = np.add( results,np.prod(myfunc(c_0[:,None,:], c_1[None,:,:], rn_1, rn_2, factor), axis=2) )

The above explicit for loop is correct and is included in the block code for quality assurance.

my current efforts

factor = b_00[:, None] + b_11[None, :]
rn = np.add( (b_00[:,None] * coord[None,:])[:, None, :],  (b_11[:,None] * coord2[None,:])[None, :, :] ) / factor[:,:,None]
rn_1 = coord - rn
rn_2 = coord2 - rn

results2 = np.prod(myfunc(c_0[:,None,:, None, None], c_1[None,:,:, None, None], rn_1[:,:,:],rn_2[:,:, :], factor[None,:, :, None]), axis=2)
results2 = np.squeeze(results2)
results2 = np.sum(results2, axis=(2,3))

Block of monolithic code

    import numpy as np 

myfunc = np.frompyfunc(func,5,1)
################################################################
# Prep arrays needed for MWE
################################################################    
results = np.zeros((5,3))
coord = np.array([1,1,2])
coord2 = np.array([3,3,3])
c_0 = np.array([[0,0,2],[0,2,0],[2,0,0],[1,1,0], [1,0,1]])
c_1 = np.array([[0,0,2],[0,2,0],[2,0,0]])
b_00 = np.array([2.2, 1.1, 4.4]) # np.array([2.2, 3.3, 40.4])
b_11 = np.array([1.2, 3.3]) # np.array([1.2, 5.3])

################################################################
# This is only for comparison. `results` is the correct answer
################################################################
for i, b_0 in enumerate(b_00):
    for j, b_1 in enumerate(b_11):
        factor = b_0 + b_1
        rn = (b_0 * coord + b_1 * coord2) / factor
        rn_1 = coord - rn
        rn_2 = coord2 - rn
        results = np.add( results,np.prod(myfunc(c_0[:,None,:], c_1[None,:,:], rn_1, rn_2, factor), axis=2) )

################################################################
# Prep for broadcasting   (My attempt)
################################################################
factor = b_00[:, None] + b_11[None, :]
rn = np.add( (b_00[:,None] * coord[None,:])[:, None, :],  (b_11[:,None] * coord2[None,:])[None, :, :] ) / factor[:,:,None]
rn_1 = coord - rn
rn_2 = coord2 - rn

# The following all get the same *wrong* answer
# results2 = np.prod(myfunc(c_0[:,None,:,None, None], c_1[None,:,:, None, None], rn_1[:, None, None],rn_2[:,None, None], factor[None,:, :]), axis=2) 
# results2 = np.prod(myfunc(c_0[:,None,:, None, None, None], c_1[None,:,:, None, None, None], rn_1[None, None,:,:,:, None],rn_2[None, None,:,:, :, None], factor[None,:, :, None, None]), axis=2) 
# results2 = np.prod(myfunc(c_0[:,None,:, None, None], c_1[None,:,:, None, None], rn_1[None, None,:,:,:],rn_2[None, None,:,:, :], factor[None,:, :, None]), axis=2)

# this should be the only line needing work!
results2 = np.prod(myfunc(c_0[:,None,:, None, None], c_1[None,:,:, None, None], rn_1[:,:,:],rn_2[:,:, :], factor[None,:, :, None]), axis=2)

results2 = np.squeeze(results2)
results2 = np.sum(results2, axis=(2,3))

assert np.allclose(results, results2)

# Vectorized function to be sent broadcasted arrays
def func(r0, r1, x, y, at):
    val = 0.0 
    for i in range(r0+1):
        for j in range(r1+1):
            val += x + i*j + at * y
    return val

Problems

With the code above I get the right shape of the result array (results2 is my attempt at broadcasting, and results is the slow for loop that gives the correct answer), but it has the wrong values.
As @hpaulj pointed out, if I change the dimensions of b_00 to be length 4 (or anything larger as well), my solution ceases to even get the correct shape.

Update

Please make sure that it works for both the current b_00 = np.array([2.2, 1.1, 4.4]) as well as a more general b_00 = np.array([2.2, 1.1, 4.4, 5.1, 6.2]). I would like a broadcasted solution, but will accept a solution that is simply faster than the for loops.

I like to use different dimensions where possible. It makes it easier the detect flaws. eg `arr=np.arange(24).reshape(2,3,4)` — hpaulj, Jul 20 '20 at 07:29
@hpaulj that is a good call. I have been looking at this all night. It gets an error when I make b_00 length 4. I will follow this lead up. Thanks. — George K, Jul 20 '20 at 07:39
sorry, i read your post put I don't see the question. what is the error and what are you trying to do? — Dorian, Jul 20 '20 at 09:26
@hpaulj only to make the code more readable... I started thinking the problem looked too intimidating if all the stuff was before the actual problem. It used to be defined as the first thing. I moved it to the bottom and did not check to make sure everything still worked. That was probably a poor assumption. I don't blame people for not bothering with questions that look to have a lot of code. — George K, Jul 24 '20 at 23:11
Interesting problem you have got here. Could you kinda walk through what you are trying to do with the myfunc? in terms of a 5x3, 3x3, 3x1, 3x1 and 1x1 ? That would help me get to the solution. — Akshay Sehgal, Jul 25 '20 at 06:19
@AkshaySehgal the final shape should be (len(c_0), len(c_1)). In the for loop example a matrix is created in each loop that is this shape, and, added to the previous iteration, element by element. The `func` being called however, is vectorized, and does not need to be looped over. It can be broadcasted to, with the hope of being faster. But, the arrays must be broadcasted correctly, and will have 2 additional dimensions - one for each for loop being "broadcasted out". The solution should look like the myfunc call, with 2 more dimensions included to account for the for loops. — George K, Jul 25 '20 at 09:22
The broadcasted solution really just needs to account for a larger rn_1 and rn_2 since they have two more dimensions, as well as a broadcasted `factor` since it also has two more dimensions — George K, Jul 25 '20 at 09:24

Akshay Sehgal · Accepted Answer · 2020-07-25T23:37:48.773

This should solve your problem!

#Definitions
coord = np.array([1,1,2])
coord2 = np.array([3,3,3])
c_0 = np.array([[0,0,2],[0,2,0],[2,0,0],[1,1,0], [1,0,1]])
c_1 = np.array([[0,0,2],[0,2,0],[2,0,0]])
b_00 = np.array([2.2, 1.1, 4.4]) # np.array([2.2, 3.3, 40.4])
b_11 = np.array([1.2, 3.3]) # np.array([1.2, 5.3])

#Vectorized code for prep 
b_0 = np.outer(b_00, coord)
b_1 = np.outer(b_11, coord2)
factor = (b_00+b_11.reshape(-1,1)).T[:,:,None]
rn = np.divide((b_0[:,None]+b_1), factor)
rn_1 = coord-rn
rn_2 = coord2-rn

#THIS IS YOUR INTERESTING PART
results = np.prod(myfunc(c_0[:,None,:,None,None], c_1[None,:,:,None,None], rn_1.transpose(2,0,1), rn_2.transpose(2,0,1), factor.transpose(2,0,1)).transpose(3,4,0,1,2), axis=4)
results = np.add.reduce(results, axis=(0,1))
results

array([[6707.793061061082, 5277.838468285241, 5277.838468285241],
       [5277.838468285241, 5992.8157646731615, 5277.838468285241],
       [5277.838468285241, 5277.838468285241, 5992.8157646731615],
       [7037.117957713655, 7513.7694886389345, 7513.7694886389345],
       [7990.421019564216, 7037.117957713655, 7513.7694886389345]],
      dtype=object)

Just for understanding purposes, since in the old loop solution, myfunc operates on rn_1 and rn_2's first axis, the broadcasting channels need to be the last 2 instead of the first. So, I add 2 channels at the end of the c_0 and c_1 and then I bring the last axis to the front making the (3,2,3) rn_1 into (3,3,2). Similarly for Factor. So now the myfunc can operate on tensors with broadcasting channels highlighted - (5,1,3,1,1), (1,3,3,1,1), (3,3,2), (3,3,2), (1,3,2)

Finally, I transpose again to bring the broadcasted channels to the front, so that we have a (3,2,5,3,3) shaped tensor where first 2 channels are broadcasted versions of the 3,2 nested loops and the 5,3,3 is the matrix that you now need to np.prod over axis = 4 instead of axis = 2.

Post that, it was a simple matter of using np.add.reduce to take sum over the 0,1 axis bring the result to a 5,3 matrix. The final result should be exactly the same as the result of the loop.

I kinda took the liberty to modify the first part as well since that was more comfortable to my eyes, but feel free to ignore that part.

PS. Checked that it works seamlessly for the example you mentioned

b_00 = np.array([2.2, 1.1, 4.4, 5.1, 6.2])