Matrix multiplication on 4D numpy arrays

Question

I need to perform matrix multiplication on two 4D arrays (m & n) with dimensions of 2x2x2x2 and 2x3x2x2 for m & n respectively, which should result in a 2x3x2x2 array. After a lot of research (mostly on this site) it seems this can be done efficiently with either np.einsum or np.tensordot, but I am unable to replicate the answer I am getting from Matlab (verified by hand). I understand how these methods (einsum and tensordot) work when performing matrix multiplication on 2D arrays (clearly explained here), but I cannot get the axes indexes correct for the 4D arrays. Clearly I’m missing something! My actual problem deals with two 23x23x3x3 arrays of complex numbers but my test arrays are:

a = np.array([[1, 7], [4, 3]]) 
b = np.array([[2, 9], [4, 5]]) 
c = np.array([[3, 6], [1, 0]]) 
d = np.array([[2, 8], [1, 2]]) 
e = np.array([[0, 0], [1, 2]])
f = np.array([[2, 8], [1, 0]])

m = np.array([[a, b], [c, d]])              # (2,2,2,2)
n = np.array([[e, f, a], [b, d, c]])        # (2,3,2,2)

I realise the complex numbers may present further issues, but for now, I am just trying to understand how the indexxing works with einsum & tensordot. The answer I’m chasing is this 2x3x2x2 array:

+----+-----------+-----------+-----------+
|    | 0         | 1         | 2         |
+====+===========+===========+===========+
|  0 | [[47 77]  | [[22 42]  | [[44 40]  |
|    |  [31 67]] |  [27 74]] |  [33 61]] |
+----+-----------+-----------+-----------+
|  1 | [[42 70]  | [[24 56]  | [[41 51]  |
|    |  [10 19]] |  [ 6 20]] |  [ 6 13]] |
+----+-----------+-----------+-----------+

and my closest attempt is by using np.tensordot:

mn = np.tensordot(m,n, axes=([1,3],[0,2]))

which gives me a 2x2x3x2 array with correct numbers but not in the right order:

+----+-----------+-----------+
|    | 0         | 1         |
+====+===========+===========+
|  0 | [[47 77]  | [[31 67]  |
|    |  [22 42]  |  [24 74]  |
|    |  [44 40]] |  [33 61]] |
+----+-----------+-----------+
|  1 | [[42 70]  | [[10 19]  |
|    |  [24 56]  |  [ 6 20]  |
|    |  [41 51]] |  [ 6 13]] |
+----+-----------+-----------+

I’ve also tried to implement some of the solutions from here but have not had any luck.
Any ideas on how I might improve this would be greatly appreciated, thanks

In APL (A Programming Language), the matrix multiply like operator is generalized for high dimension objects. The multiply / add occurs across the last dimension of the left object and the first dimension of the second object. In the case of a 2x2x2x2 +.x 2x3x2x2, the "inner" (...x2 , 2x...) dimensions are dropped, and the result would be a 2x2x2x3x2x2 object. I'm wondering if Matlab follows this same rule. — rcgldr, Dec 11 '17 at 12:21

score 3 · Answer 1 · answered Dec 11 '17 at 12:07

3

Your best bet, since your reduction dimensions neither match (which would allow broadcasting) nor are the "inner" dimensions (which would work natively with np.tensordot) is to use np.einsum

np.einsum('ijkl,jmln->imkn', m, n)

array([[[[47, 77],
         [31, 67]],

        [[22, 42],
         [24, 74]],

        [[44, 40],
         [33, 61]]],


       [[[42, 70],
         [10, 19]],

        [[24, 56],
         [ 6, 20]],

        [[41, 51],
         [ 6, 13]]]])

answered Dec 11 '17 at 12:07

Daniel F

13,620
2
29
55

I think this method is the easiest to apply/understand but I'm surprised by how much slower it is compared to @Divakar's tensordot solution. All the info I saw suggested einsum would be the most efficient method.. – 4bears Dec 11 '17 at 15:10
@AndrewForbes Just the BLAS based `tensordot` being too efficient. – Divakar Dec 11 '17 at 15:50
Yeah, unfortunately `np.einsum`, while very clear for people who know Einstein Summation Notation, is not as optimized as the various `*dot` operators. But since your toy problem didn't seem to have the same form as your actual problem, I didn't think the premature optimization was worth it. – Daniel F Dec 11 '17 at 17:18

Divakar · Accepted Answer · 2017-12-11T12:34:17.970

You could simply swap the axes on the tensordot result, so that we would still leverage BLAS based sum-reductions with tensordot -

np.tensordot(m,n, axes=((1,3),(0,2))).swapaxes(1,2)

Alternatively, we could swap the positions of m and n in the tensordot call and transpose to re-arrange all the axes -

np.tensordot(n,m, axes=((0,2),(1,3))).transpose(2,0,3,1)

Using the manually labor of reshaping and swapping axes, we can bring in 2D matrix multiplication with np.dot as well, like so -

m0,m1,m2,m3 = m.shape
n0,n1,n2,n3 = n.shape
m2D = m.swapaxes(1,2).reshape(-1,m1*m3)
n2D = n.swapaxes(1,2).reshape(n0*n2,-1)
out = m2D.dot(n2D).reshape(m0,m2,n1,n3).swapaxes(1,2)

Runtime test -

Scaling the input arrays to 10x shapes :

In [85]: m = np.random.rand(20,20,20,20)

In [86]: n = np.random.rand(20,30,20,20)

# @Daniel F's soln with einsum
In [87]: %timeit np.einsum('ijkl,jmln->imkn', m, n)
10 loops, best of 3: 136 ms per loop

In [126]: %timeit np.tensordot(m,n, axes=((1,3),(0,2))).swapaxes(1,2)
100 loops, best of 3: 2.31 ms per loop

In [127]: %timeit np.tensordot(n,m, axes=((0,2),(1,3))).transpose(2,0,3,1)
100 loops, best of 3: 2.37 ms per loop

In [128]: %%timeit
     ...: m0,m1,m2,m3 = m.shape
     ...: n0,n1,n2,n3 = n.shape
     ...: m2D = m.swapaxes(1,2).reshape(-1,m1*m3)
     ...: n2D = n.swapaxes(1,2).reshape(n0*n2,-1)
     ...: out = m2D.dot(n2D).reshape(m0,m2,n1,n3).swapaxes(1,2)
100 loops, best of 3: 2.36 ms per loop

I could see that an axis swap of some sort might do it but I didn't go down that path as I thought tensordot could do it without rearranging. Why such a big difference in speed between einsum & tensordot? — 4bears, Dec 11 '17 at 14:52
@AndrewForbes To see how `tensordot` "spreads out" the leftover axes, here's a relevant post - https://stackoverflow.com/a/41871402/. — Divakar, Dec 11 '17 at 14:53

score 2 · Answer 3 · answered Dec 11 '17 at 12:19

2

Just to demonstrate that broadcasting also works:

(m[:, :, None, :, :, None] * n[None, :, :, None, :, :]).sum(axis=(1,4))

But the other solutions posted are probably faster, at least for large arrays.

answered Dec 11 '17 at 12:19

Paul Panzer

51,835
3
54
99

Matrix multiplication on 4D numpy arrays

3 Answers3