I have converted this python eimsum expression
psi_p = np.einsum('ij...,j...->i...', exp_p, psi_p)
to c++ like this:
 int io=0;
`for (i=0; i < 4; i++){             
    ikauxop=i*nd;
    for (j=0; j < 4; j++){    
        jkpsi=nd*j;     
        for (k=0; k < m_N; k++){                            
            m_auxop[ikauxop+k] +=  m_opK [io++] * m_wf[jkpsi + k];      
        }
    }               
}
But in phyton is 2 times faster than in c++.
m_auxop and m_wf are 2d array flatten in 1D, and m_opK is a 3d array flatten in 1D, so I wonder who can I speed this in c++? `
The array types are std::complex, and I tried with flatten or not arrays and I get the same time
