Imagine we have an array of unique integers. Given an integer (N) of that list, I want to be able to get its index (I) in the array as fast as possible.
My idea was to generate an object that given N returns I. I though of using either a structured array with data type (N,I) and sorted by N or simply a dictionary with keys N.
The search speeds of both methods seems independent of the size of the object which leads me to believe that they are controlled by the overheads. However, I was a bit surprised to find that searching the dictionary was almost 10x faster than searching the structured array. So my questions are:
- Why is the dictionary so much faster than my array implementation?
- Is there an alternative method that is even faster than these two?
MWE:
from __future__ import division
import numpy as np
import timeit
#Time a function
def Timeme(funct,var,NN=10,NNN=10):
for i in xrange(NN):
start =timeit.default_timer()
for t in xrange(NNN):
funct(*var)
end =timeit.default_timer()
print str(i)+': '+str((end - start)/NNN*1000)
#Function to build a dictionary
def mydict(Flist):
Mydict=dict()
for n,i in Flist:
Mydict[n]=i
return Mydict
#Functions to access the data
def myfd(Mydict,vtest):
return Mydict[vtest]
def myfs(Flist,vtest):
n=Flist['N'].searchsorted(vtest)
return Flist['I'][n] #Flist[n]['I'] is slower
#N=100000
N=100
# "Allocate empty structured array"
Flist=np.empty(N,dtype=[('N','i4'),('I','i4')])
# "Fill N with randoms and I with sequence"
Flist['N'] = np.random.randint(N*1000,size=N)
Flist['I'] = np.arange(N)
# "Create test value"
ntest=np.random.randint(N)
vtest=Flist['N'][ntest]
# "Sort array on N"
Flist.sort(order='N')
# "Make dictionary"
Mydict=dict(Flist)
# "Get values"
nrd=myfd(Mydict,vtest)
nrs=myfs(Flist,vtest)
print "Tests OK: " + str(ntest == nrd and ntest == nrs)
print "\nSearch with Dictionary:"
Timeme(myfd,[Mydict,vtest],NN=5,NNN=100)
print "\nSearch directly in Array:"
Timeme(myfs,[Flist,vtest],NN=5,NNN=100)
Result:
Tests OK: True
Search with Dictionary:
0: 0.000404204885682
1: 0.000409016848607
2: 0.000418640774457
3: 0.000404204885682
4: 0.000394580959833
Search directly in Array:
0: 0.00455211692685
1: 0.00465798011119
2: 0.00458580066732
3: 0.00464354422242
4: 0.00476384329554