Speeding up a for loop

Question

I have a execution speed problem with my code. The following code tries to calculate the nearest neighbor rows of the each row of a matrix. The metric for the distance calculation is Euclidian distance based on a near neighbor exclusion constraint. At the end, I need the row numbers of 8 nearest neighbors for each row of the matrix.

The input matrix (EmbedMat) is a segmented version of the time series data. Like:

Here is the following working example. Buffer is the custom function to create EmbedMat.

import numpy as np
def buffer(data,num_perseg,overlap):
    numberOfSegments = int(math.ceil((len(data)-overlap)/(num_perseg-overlap)))
    tempBuf = [data[i:i+num_perseg] for i in range(0,len(data),(num_perseg-int(overlap)))]
    tempBuf[numberOfSegments-1] = np.pad(tempBuf[numberOfSegments-1],(0,num_perseg-tempBuf[numberOfSegments-1].shape[0]),'constant')
    tempBuf2 = np.vstack(tempBuf[0:numberOfSegments])
    return tempBuf2

EmbedMat=buffer(np.arange(0,50),5,4)

def distance_calculator(EmbedMat,E=5,nn=8):
# Calculates the Euclidian distance of the rows of embedded matrix
# subjecting it to a near-in-time neighbour exclusion constraint
# abs(jp-i)>E/2
    bins=np.arange(0,EmbedMat.shape[0])
    
    # calculate all possible combinations of rows
    bin_comb=[(x,y) for x in bins for y in bins]
    
    count=0
    dist=np.zeros((EmbedMat.shape[0]*EmbedMat.shape[0],))
    for i in bin_comb:
        if abs(i[0]-i[1])>E/2:
            # without numpy.linalg.norm library, execution time is less.
            dist[count]=math.sqrt(((EmbedMat[i[0],:]-EmbedMat[i[1],:])**2).sum())
        else:
            dist[count]=np.NAN
        count=count+1
    
    dist_buf=buffer(dist,EmbedMat.shape[0],0)
    dist_buf=dist_buf[:,0:EmbedMat.shape[0]-1]
    # distance from x/row to y/col
    # i.e [4,5] represents the distance(row4,row5) of EmbedMat
        
    sortIndex=np.argsort(dist_buf,axis=1)
    
    sort_red=sortIndex[:,0:nn]

    return sort_red

sort_red=distance_calculator(EmbedMat)

Works good with 50 samples but the execution time gets worse (up to 5-7 seconds) if the sample length is 1000 and more.

I am relatively new to Python and NOT coming from a developer background, so that you may not see good coding practices in the codes above. Sorry for the mess you see.

Any help will be appreciated.

You can [profile](https://stackoverflow.com/q/582336/11384184) a Python script to know what exactly is slow. Also, what performance do you expect for N=1000 ? and above ? I quickly measured (with differences between calls to `time.time()`), and it is your for loop that slows down your script. I'm not a NumPy expert so I can't help you on that, but focus on it (find another algorithm, optimize what you wrote, ...). — Lenormju, Jun 18 '21 at 20:00
You want to use [Balltree](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.BallTree.html) from sklearn, it is very fast. Let me know if you need a working example — Willem Hendriks, Jun 18 '21 at 20:07
Quick 2× speedup: `bin_comb=[(x,y) for x in bins for y in bins if x>y]`; since distance is symmetrical, only calculate it once for each pair, and also skip self-distance. — Jiří Baum, Jun 19 '21 at 01:51

score 0 · Accepted Answer · answered Jun 22 '21 at 05:25

Thanks a lot for your suggestions. Based on your comments, I came up with this solution, which decreased the execution time to half of the original one, as sabik suggested:

EmbedMat=buffer(np.arange(0,50),5,4)

def distance_calculator(EmbedMat,E=5,nn=8):
# Calculates the Euclidian distance of the rows of embedded matrix
# subjecting it to a near-in-time neighbour exclusion constraint
# abs(jp-i)>E/2
    bins=np.arange(0,EmbedMat.shape[0])
    
    # calculate all possible combinations of rows
    bin_comb=[(x,y) for x in bins for y in bins]
    
    count=0
    dist=np.zeros((EmbedMat.shape[0]*EmbedMat.shape[0],))
    for i in bin_comb:
        if abs(i[0]-i[1])>E/2 and i[1]>i[0]:
            # without numpy.linalg.norm library, execution time is less.
            dist[count]=math.sqrt(((EmbedMat[i[0],:]-EmbedMat[i[1],:])**2).sum())
        else:
            dist[count]=np.NAN
        count=count+1
    
    dist_buf=buffer(dist,EmbedMat.shape[0],0)
    dist_buf=dist_buf+dist_buf.T
    dist_buf=dist_buf[:,0:EmbedMat.shape[0]-1]
    # distance from x/row to y/col
    # i.e [4,5] represents the distance(row4,row5) of EmbedMat
        
    sortIndex=np.argsort(dist_buf,axis=1)
    
    sort_red=sortIndex[:,0:nn]

    return sort_red

sort_red=distance_calculator(EmbedMat)

In short, I have introduced another constraint for the for loop which handles y>x (i[1]>i[0]). Then, to have the same result, I have summed it up with its transpose. Now works good. Thank you!

Speeding up a for loop

1 Answers1