My goal is to sort the data frame by 1 column and return a json object as efficiently as possible.
For repoduction, please define the following dataframe:
import pandas as pd
import numpy as np
test = pd.DataFrame(data={'a':[np.random.randint(0,100) for i in range(10000)], 'b':[i + np.random.randint(0,100) for i in range(10000)]})
       a      b
0     74     89
1     55     52
2     53     39
3     26     21
4     69     34
What I need to do is sort by column a and then encode the output in a json object. I'm taking the basic approach and doing:
test.sort_values('a', ascending=True, inplace=True) # n log n
data = [{}] # 1
for d in test.itertuples(): # n times
    to_append = {'id': d.Index, 'data': {'a': d.a, 'b': d.b}} # 3 
    data.append(to_append) # 1
So is the cost nlogn + n*4? Are there any more efficient ways of doing it?
 
    