I have a 512x512 image array and I want to perform operations on 8x8 blocks. At the moment I have something like this:
output = np.zeros(512, 512)
for i in range(0, 512, 8):
    for j in rangerange(0, 512, 8):
        a = input[i:i+8, j:j+8]
        b = some_other_array[i:i+8, j:j+8]
        output[i:i+8, j:j+8] = np.dot(a, b)
where a & b are 8x8 blocks derived from the original array. I would like to speed up this code by using vectorised operations. I have reshaped my inputs like this:
input = input.reshape(64, 8, 64, 8)
some_other_array = some_other_array.reshape(64, 8, 64, 8)
How could I perform a dot product on only axes 1 & 3 to output an array of shape (64, 8, 64, 8)?
I have tried np.tensordot(input, some_other_array, axes=([0, 1], [2, 3])) which gives the correct output shape, but the values do not match the output from the loop above. I've also looked at np.einsum but I haven't come across a simple example with what I'm trying to achieve.