I was curious about how much difference it really made to use np.empty instead of np.zeros, and also about the difference with respect to np.ones. I run this small script to benchmark the time it took for each of these to create a large array:
import numpy as np
from timeit import timeit
N = 10_000_000
dtypes = [np.int8, np.int16, np.int32, np.int64,
          np.uint8, np.uint16, np.uint32, np.uint64,
          np.float16, np.float32, np.float64]
rep= 100
print(f'{"DType":8s} {"Empty":>10s} {"Zeros":>10s} {"Ones":>10s}')
for dtype in dtypes:
    name = dtype.__name__
    time_empty = timeit(lambda: np.empty(N, dtype=dtype), number=rep) / rep
    time_zeros = timeit(lambda: np.zeros(N, dtype=dtype), number=rep) / rep
    time_ones = timeit(lambda: np.ones(N, dtype=dtype), number=rep) / rep
    print(f'{name:8s} {time_empty:10.2e} {time_zeros:10.2e} {time_ones:10.2e}')
And obtained the following table as a result:
DType         Empty      Zeros       Ones
int8       1.39e-04   1.76e-04   5.27e-03
int16      3.72e-04   3.59e-04   1.09e-02
int32      5.85e-04   5.81e-04   2.16e-02
int64      1.28e-03   1.13e-03   3.98e-02
uint8      1.66e-04   1.62e-04   5.22e-03
uint16     2.79e-04   2.82e-04   9.49e-03
uint32     5.65e-04   5.20e-04   1.99e-02
uint64     1.16e-03   1.24e-03   4.18e-02
float16    3.21e-04   2.95e-04   1.06e-02
float32    6.31e-04   6.06e-04   2.32e-02
float64    1.18e-03   1.16e-03   4.85e-02
From this I extract two somewhat surprising conclusions:
- There is virtually no difference between the performance of np.emptyandnp.zeros, maybe excepting some difference forint8. I don't understand why this is the case. Creating an empty array is supposed to be faster, and actually I have seen reports of that (e.g. Speed of np.empty vs np.zeros).
- There is a great difference between np.zerosandnp.ones. I suspect this has to do with high-performance means for memory zeroing that do not apply to filling a memory area with a constant, but I don't really know how or at what level that works.
What is the explanation for these results?
I am using NumPy 1.15.4 and Python 3.6 Anaconda on Windows 10 (with MKL), and I have a Intel Core i7-7700K CPU.
EDIT: As per a suggestion in the comments, I tried running the benchmark interleaving each individual trial and averaging at the end, but I couldn't see a significant difference in the results. On a related note, though, I don't know if there are any mechanisms in NumPy to reuse the memory of a just deleted array, which would make the measures unrealistic (although the times do seem to go up with the data type size even for empty arrays).
 
    