I have two np.ndarrays
ais an array of shape(13000, 8, 315000)and typeuint8bis an array of shape(8,)and typefloat32
I want to multiply each slice along the second dimension (8) by the corresponding element in b and sum along that dimension (i.e. a dot product along the second axis). The result will be of shape (13000, 315000)
I have devised two ways of doing this:
np.einsum('ijk,j->ik', a, b): using%timeitit gives49 s ± 12.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)np.dot(a.transpose(0, 2, 1), b): using%timeitit gives1min 8s ± 3.54 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Are there faster alternatives?
Complementary information
np.show_config() returns:
blas_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
blis_info:
NOT AVAILABLE
lapack_opt_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
libraries = ['openblas', 'openblas']
language = c
library_dirs = ['/usr/local/lib']
define_macros = [('HAVE_CBLAS', None)]
a.flags:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
b.flags:
C_CONTIGUOUS : True
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False