Looking at the resource monitor during the execution of my script I noticed that all the cores of my PC were working, even if I did not implement any form of multiprocessing. Trying to pinpoint the cause, I discovered that the code is parallelized when using numpy's matmult (or, as in the example below, the binary operator @).
import numpy as np
A = np.random.rand(10,500)
B = np.random.rand(500,50000)
while True:
_ = A @ B
Looking at this question it looks like the reason is that numpy invokes BLAS/LAPACK routines that are indeed parallelized.
Despite being nice that my code runs faster and uses all available resources, this is causing me troubles when I submit my code on a shared cluster managed by PBS queue manager. Together with the cluster IT manager, we noticed that even if I ask for N CPUs on a cluster node, numpy was still spawning a number of threads equal to the number of CPUs on the node.
This resulted in the node to be overloaded, as I was using more CPUs than those assigned to me.
Is there a way to "control" this behaviour and tell numpy how many CPUs it can use?