I have examined Why does pickle take so much longer than np.save? before posting question.
From the answers there, we could think that numpy should work faster with ndarrays.
But look at these experiments!
Functions we test:
import numpy as np
import pickle as pkl
a = np.random.randn(1000,5)
with open("test.npy", "wb") as f:
    np.save(f, a)
with open("test.pkl", "wb") as f:
    pkl.dump(a,f)    
def load_with_numpy(name):
    for i in range(1000):
        with open(name, "rb") as f:
            np.load(f)
def load_with_pickle(name):
    for i in range(1000):
        with open(name, "rb") as f:
            pkl.load(f)
Experiment results:
%timeit load_with_numpy("test.npy")
296 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit load_with_pickle("test.pkl")
28.2 ms ± 994 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Why is that so?