IEEE floating point operations are deterministic, but see How can floating point calculations be made deterministic? for one way that an overall floating point computation can be non-deterministic:
... parallel computations are non-deterministic in terms of the order in which floating-point computations are performed, which can result in non-bit-exact results across runs.
Two-part question:
- How else can an overall floating point computation be non-deterministic, yielding results that are not exactly equal?
- Consider a single-threaded Python program that calls NumPy, CVXOPT, and SciPy subroutines such as - scipy.optimize.fsolve(), which in turn call native libraries like MINPACK and GLPK and optimized linear algebra subroutines like BLAS, ATLAS, and MKL. “If your numpy/scipy is compiled using one of these, then dot() will be computed in parallel (if this is faster) without you doing anything.”- Do these native libraries ever parallelize in a way that introduces non-deterministic results? 
Assumptions:
- The same software, with the same inputs, on the same hardware. The output of multiple runs should be equal.
- If that works, it's highly desirable to test that the output after doing a code refactoring is equal. (Yes, some changes in order of operations can make some of the output not-equal.)
 
- All random numbers in the program are psuedo-random numbers used in a consistent way from the same seeds across all runs.
- No uninitialized values. Python is generally safe in that way but - numpy.empty()returns a new array without initializing entries. And it's not clear that it's much faster in practice. So beware!- @PaulPanzer's test shows that - numpy.empty()does return an uninitialized array and it can easily and quickly recycle a recent array:- import numpy as np np.arange(100); np.empty(100, int); np.empty(100, int) np.arange(100, 200.0); np.empty(100, float); np.empty(100, float)
- It's tricky to get useful timing measurements for these routines! In a - timeitloop,- numpy.empty()can just keep reallocating the same one or two memory nodes. The time is independent of the array size. To prevent recycling:- from timeit import timeit timeit('l.append(numpy.empty(100000))', 'import numpy; l = []') timeit('l.append(numpy.zeros(100000))', 'import numpy; l = []')- but reducing that array size to - numpy.zeros(10000)takes 15x as long; reducing it to- numpy.zeros(1000)takes 1.3x as long (on my MBP). Puzzling.
 
See also: Hash values are salted in Python 3 and each dict preserves insertion order. That could vary the order of operations from run to run. [I'm wrangling with this problem in Python 2.7.15.]
