With np.einsum('ikl,kjl->ijl', A, B), there is axis alignment requirement with string - l that stays with the inputs and the output. As such, using np.tensordot might not necessarily result in performance improvement, but since the question has specifically asked for it, let's suggest it anyway. Now, np.tensordot would spread out the axes that don't take part in sum-reduction as separate axes, resulting in (N,N). So, to get to the final output, we need to extract the elements along the diagonal of the spread-out axes.
Here's how a solution with np.tensordot would look like -
mask = np.eye(N,dtype=bool)
out = np.tensordot(A,B,axes=((1),(0))).swapaxes(1,2)[:,:,mask]
There would be cases where np.dot/np.tensordot might come out as winner, but that requires that the sum-reduction axes have decent lengths. See this post for a more detailed analysis.
For the given problem, that's not the case as the length of sum-reduction is just 4. So, I would think einsum would be best here!