confused about runtime of differents methods for distance of (2-d) points

Question

Recently I am working on a tower-defense game in python (3.7). In the code of this game I have to check the distance between different 2-d points alot. So I wanted to check what the fastest methods for completing this task are.

I am not firm about the particular norm. The 2-norm seems like natural choice, but im not oppossed to change to the infinity-norm, if neccessary for execution time (https://en.wikipedia.org/wiki/Lp_space).

I was under the impression, that list-comprehension is faster than for loops and also build in functions (e.g. in numpy) are faster than most of code I could write. I tested some alternatives for computing the distance and was suprised that I can not confirm this impression. To my suprise the numpy-linalg.norm-function performs horrible. This was the code I used to test runtimes:

import timeit
import numpy as np
import test_extension

number_of_repeats = 10000

tower_pos = (3,5)
enemy_pos = ((1,2),(2,3),(3,4),(4,5),(5,6),(6,7),(7,8),(8,9))
distance = 3
options = ('np.linalg 2 norm', 'np.linalg inf norm', 'inf norm manuel', 'manuel squared 2 norm', 'hard coded squared 2 norm', 'c-extension hard coded squared 2 norm', 'all in extension')

def list_comprehension(option):
    if option == 0:
        return [pos for pos in enemy_pos if np.linalg.norm(np.subtract(tower_pos, pos)) <= distance]
    elif option == 1:
        return [pos for pos in enemy_pos if np.linalg.norm(np.subtract(tower_pos, pos), ord = 1) <= distance]
    elif option == 2:
        return [pos for pos in enemy_pos if max(abs(np.subtract(tower_pos, pos))) <= distance]
    elif option == 3:
        return [pos for pos in enemy_pos if sum(np.square(np.subtract(tower_pos, pos))) <= distance**2]
    elif option == 4:
        return [pos for pos in enemy_pos for distance_vector in [np.subtract(tower_pos, pos)] if distance_vector[0]*distance_vector[0]+distance_vector[1]*distance_vector[1] <= distance**2 ]
    elif option == 5:
        return [pos for pos in enemy_pos if test_extension.distance(np.subtract(tower_pos, pos)) <= distance**2]
    elif option == 6:
        return test_extension.list_comprehension(tower_pos, enemy_pos, distance)

def for_loop(option):
    l = []
    if option == 0:
        for pos in enemy_pos:
            distance_vector = np.subtract(tower_pos, pos)
            norm = np.linalg.norm(distance_vector)
            if norm <= distance:
                l.append(pos)
    elif option == 1:
        for pos in enemy_pos:
            distance_vector = np.subtract(tower_pos, pos)
            norm = np.linalg.norm(distance_vector, ord = np.inf)
            if norm <= distance:
                l.append(pos)
    elif option == 2:
        for pos in enemy_pos:
            distance_vector = np.subtract(tower_pos, pos)
            norm = max(abs(distance_vector))
            if norm <= distance:
                l.append(pos)
    elif option == 3:
        for pos in enemy_pos:
            distance_vector = np.subtract(tower_pos, pos)
            norm = sum(distance_vector * distance_vector)
            if norm <= distance**2:
                l.append(pos)
    elif option == 4:
        for pos in enemy_pos:
            distance_vector = np.subtract(tower_pos, pos)
            norm = distance_vector[0]*distance_vector[0]+distance_vector[1]*distance_vector[1]
            if norm <= distance**2:
                l.append(pos)
    elif option == 5:
        d = test_extension.distance
        for pos in enemy_pos:
            distance_vector = np.subtract(tower_pos, pos)
            norm = d(distance_vector)
            if norm <= distance**2:
                l.append(pos)
    elif option == 6:
        return test_extension.for_loop(tower_pos, enemy_pos, distance)
    return l 

print(f"{'_'.ljust(40)}list_comprehension   for_loop")
for i,option in enumerate(options):
    times = [timeit.timeit(f'{method}({i})', number = number_of_repeats, globals = {f'{method}': globals()[f'{method}']}) for method in ['list_comprehension', 'for_loop']]
    s = f'{option}:'.ljust(40)
    print(f"{s}{round(times[0],5)}{' '*15}  {round(times[1],5)}")

which gave the following output:

                                        list_comprehension   for_loop
np.linalg 2 norm:                       3.58053                 3.56676
np.linalg inf norm:                     3.17169                 3.53372
inf norm manuel:                        1.15261                 1.1951
manuel squared 2 norm:                  1.30239                 1.28485
hard coded squared 2 norm:              1.11324                 1.08586
c-extension hard coded squared 2 norm:  1.01506                 1.05114
all in extension:                       0.81358                 0.81262

test_extension contains the below code. I then created a c-extension from test_extension using the package cython (https://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html):

import numpy as np

def distance(vector: np.array):
    return vector[0]*vector[0]+vector[1]*vector[1]

def for_loop(single_pos, pos_list, distance):
    l = []
    for pos in pos_list:
        distance_vector = np.subtract(single_pos, pos)
        norm = distance_vector[0]*distance_vector[0]+distance_vector[1]*distance_vector[1]
        if norm <= distance**2:
            l.append(pos)
    return l

def list_comprehension(single_pos, pos_list, distance):
    return [pos for pos in pos_list for distance_vector in [np.subtract(single_pos, pos)] if distance_vector[0]*distance_vector[0]+distance_vector[1]*distance_vector[1] <= distance**2 ]

At the moment i mainly have the 3 below questions, but feel free to give other insights you can share:

Why are the build in np.linalg.norms so slow? Am I missusing those?
Why is hardcoding the squared 2 norm for a vector v better than using sum(np.square(v))?
Have I missed even faster methods for computing the distance (as 2-norm)?

Reti43 · Accepted Answer · 2021-11-11T00:42:01.087

For game development, you probably need to implement spatial partitioning when you need to frequently query what objects are near another.

The rest of this answer will deal with the numpy behaviour observed. First of all, list comprehensions are marginally faster than loops (assuming you're building an iterable you want to keep).

Now, let's set up a function to collect time statistics. We'll also create a few arrays that we'll use frequently and we need to exclude their creation time.

def timer(func):
    times = timeit.repeat(func, number=number_of_repeats, repeat=10)
    return f'{np.mean(times):.5f} ({np.std(times):.5f})'

enemy_numpy = np.array(enemy_pos)
tower_numpy = np.array(tower_pos)

enemy_pos_large = enemy_pos * 20
enemy_numpy_large = np.array(enemy_pos_large)

Python level loops are slow compared to numpy operations. However, most numpy operations create a new array, whose allocation takes up a long time. On top of that the function call has its own overhead. All of this upfront cost may dominate compared to just hardcoding a couple of index getters and operations even at the slow python level.

np.subtract(enemy_pos[i] - tower_pos) creates two 2-element arrays and then a final 2-element array for the result. The allocation time dominates over the small number of operations involved.

>>> timer(lambda: np.subtract(enemy_pos[0], tower_pos))
'0.03206 (0.00031)'
>>> # initial array allocation
>>> timer(lambda: (np.array(enemy_pos[0]), np.array(tower_pos)))
'0.02101 (0.00014)'
>>> # straight up numpy subtraction of one enemy position
>>> timer(lambda: enemy_numpy[0] - tower_numpy)
'0.00636 (0.00019)'
>>> # pure python with creating tuples
>>> timer(lambda: (enemy_pos[0][0] - tower_pos[0], enemy_pos[0][1] - tower_pos[1]))
'0.00145 (0.00006)'

For bigger arrays the conversion of tuples to arrays still dominates, but the allocation for the output and the speed of numpy operations shines.

>>> # subtract all positions at once with `np.subtract`
>>> timer(lambda: np.subtract(enemy_pos, tower_pos))
'0.09057 (0.00029)'
>>> # same for the bigger array; 20x bigger, but only 10x slower
>>> timer(lambda: np.subtract(enemy_pos_large, tower_pos))
'1.07190 (0.00133)'
>>> # pure python beats numpy by an order of magnitude for the small list
>>> timer(lambda: [(ex - tower_pos[0], ey - tower_pos[1]) for ex, ey in enemy_pos])
'0.00918 (0.00007)'
>>> # pure python for the big list still wins, 20x bigger, 16x slower
>>> timer(lambda: [(ex - tower_pos[0], ey - tower_pos[1]) for ex, ey in enemy_pos_large])
'0.14962 (0.00055)'
>>> # preallocated small numpy arrays are barely edged out by pure python
>>> timer(lambda: enemy_numpy - tower_numpy)
'0.01140 (0.00014)'
>>> # preallocated big numpy arrays are outperforming pure python; 20x bigger, 2x slower
>>> timer(lambda: enemy_numpy_large - tower_numpy)
'0.01817 (0.00028)'

vector[0]*vector[0] + vector[1]*vector[1] squares two integers and then adds them together. It avoids creating new arrays for the squaring and the sum and because of that the python-level slow operations edge out numpy. But things change for a bigger array.

>>> v2 = np.arange(2)
>>> timer(lambda: v2[0]**2 + v2[1]**2)
'0.00741 (0.00020)'
>>> timer(lambda: np.add.reduce(v2**2))
'0.01756 (0.00027)'
>>> v6 = np.arange(6)
>>> # too many slow python operations
>>> timer(lambda: v6[0]**2 + v6[1]**2 + v6[2]**2 + v6[3]**2 + v6[4]**2 + v6[5]**2)
'0.02321 (0.00028)'
>>> # better off creating the intermediate arrays and letting numpy do its thing
>>> timer(lambda: np.add.reduce(v6**2))
'0.01744 (0.00016)'

Bottom line: Parsing a python iterable to a numpy array is costly. Numpy likes one method call acting upon the whole data at once. If you can initialise your array only once and then work with, anything other than for very small sizes the intermediate-output arrays won't matter and you'll get the advantage of fast numpy operations.

Now, let's write numpy versions of your approaches which use the array versions of your lists and see how those perform compared to python loops. Let's also compare that to pure python functions where no numpy functions are ever called. Note that some options are skipped because their code would be the same as a previous approach.

def numpified(option):
    if option == 0:
        # we can improve this timing if the arrays are float by default,
        # as `np.linalg.norm` will convert them otherwise
        return enemy_numpy[np.linalg.norm(tower_numpy - enemy_numpy, axis=1) <= distance]
    elif option == 1:
        return enemy_numpy[np.linalg.norm(tower_numpy - enemy_numpy, axis=1, ord=np.inf) <= distance]
    elif option == 2:
        return enemy_numpy[np.max(np.abs(tower_numpy - enemy_numpy), axis=1) <= distance]
    elif option == 3:
        return enemy_numpy[np.sum((tower_numpy - enemy_numpy)**2, axis=1) <= distance**2]
    elif option == 4:
        # same as option == 3
        return []
    elif option == 5:
        temp = (tower_numpy - enemy_numpy)**2
        return enemy_numpy[test_extension.distance2d(tower_numpy - enemy_numpy) <= distance**2]
    elif option == 6:
        # same as option == 5
        return []

def pure(option):
    l = []
    tx, ty = tower_pos
    if option == 0:
        for ex, ey in enemy_pos:
            x = tx - ex
            y = ey - ey
            # since we're dealing with real numbers, |x|**2 is just x**2.
            # We can also skip taking the square root by comparing against `distance**2`
            if x*x + y*y <= distance**2:
                l.append((ex, ey))
    elif option == 1:
        for ex, ey in enemy_pos:
            if max((abs(tx - ex), abs(ty - ey))) <= distance:
                l.append((ex, ey))
    else:
        # the rest are the same to the above
        return []

methods = [list_comprehension, for_loop, numpified, pure]
print('{:40s}'.format('_') + ''.join(f'{m.__name__:20s}' for m in methods))
for i, option in enumerate(options):
    times = ''.join(f'{timer(lambda: m(i)):20s}' for m in methods)
    print(f'{option:40s}{times}')

In the test_extension module also add the following function

def distance2d(vector: np.array):
    return vector[:,0]*vector[:,0]+vector[:,1]*vector[:,1]

Timings

_                                       list_comprehension  for_loop            numpified           pure                
np.linalg 2 norm                        0.61583 (0.00289)   0.61282 (0.00278)   0.12935 (0.00600)   0.02203 (0.00016)   
np.linalg inf norm                      0.68147 (0.00486)   0.68253 (0.00515)   0.09752 (0.00338)   0.02001 (0.00009)   
inf norm manuel                         0.38958 (0.00235)   0.38934 (0.00275)   0.07727 (0.00096)   0.00151 (0.00002)   
manuel squared 2 norm                   0.43648 (0.00114)   0.42091 (0.00053)   0.08799 (0.00165)   0.00152 (0.00001)   
hard coded squared 2 norm               0.34672 (0.00061)   0.34031 (0.00067)   0.00162 (0.00002)   0.00152 (0.00001)   
c-extension hard coded squared 2 norm   0.34750 (0.00097)   0.34609 (0.00089)   0.09728 (0.00251)   0.00152 (0.00002)   
all in extension                        0.34823 (0.00082)   0.34101 (0.00087)   0.00184 (0.00001)   0.00154 (0.00001)

We can see that pure python is actually doing better. Let's try now with the bigger arrays and only comparing numpified against pure. Set the following before running the timing loop again.

enemy_pos = enemy_pos_large
enemy_numpy = enemy_numpy_large
methods = [numpified, pure]

Timings

_                                       numpified           pure                
np.linalg 2 norm                        0.13939 (0.00460)   0.41680 (0.00130)   
np.linalg inf norm                      0.12642 (0.00134)   0.38518 (0.00072)   
inf norm manuel                         0.10712 (0.00107)   0.00155 (0.00004)   
manuel squared 2 norm                   0.12589 (0.00057)   0.00156 (0.00004)   
hard coded squared 2 norm               0.00165 (0.00007)   0.00153 (0.00002)   
c-extension hard coded squared 2 norm   0.12898 (0.00288)   0.00154 (0.00002)   
all in extension                        0.00199 (0.00005)   0.00166 (0.00004)

You can probably make more performance improvements by using numba, but before you look into that, consider spatial partitioning.

First of all, thank you for your detailed answer. I was thinking about different options other than checking for the distance between objects (e.g. towers and enemies). Spatial partitioning game to mind (without knowing the acutal name for it). I was planning of adressing this in a different question. However I am not convinced that Spatial partitioning helps in my case. Because in my game there are several effects which can change the range of any object in the game. — 31415926535, Nov 13 '21 at 08:33
Just to be sure and maybe for clarification for other people: The lines 5/7 under numpified and 3-7 under pure (with the fastest times) do not involve any computation of distances. They are just dummy lines with dummy returns so the code will run? — 31415926535, Nov 13 '21 at 08:57
@31415926535 That is correct. I skipped them just to save time. — Reti43, Nov 13 '21 at 09:33

confused about runtime of differents methods for distance of (2-d) points

1 Answers1

For game development, you probably need to implement spatial partitioning when you need to frequently query what objects are near another.