Related question How to speed up reshape in higher rank tensor contraction by BLAS in Fortran?
Suppose I have a contraction
A[a,b] * B[c,b,d] = C[a,c,d] (the dummy index b is in middle of tensor B, which looks tricky)
and I would like to use DGEMM for it.
What I can do is
- reshape
B[c,b,d]intoB2[b,c,d] - use the method in How to speed up reshape in higher rank tensor contraction by BLAS in Fortran? to evaluate
A[a,b] * B2[b,c,d] = C[a,c,d]
Reshape can take a bit time. (If the target were C[c,a,d], I need one more shape). Is there more efficient approach?