I'm trying to change my DataFrame's values like this:
df['Tokens'] = tokens
Where tokens is a 2-d np.array.
I expected to have a column, where each element is a 1-d np.array, but found out, that each element took only first element of a correspoding 1-d array. Is there a way to store arrays in DataFrame's elements?
            Asked
            
        
        
            Active
            
        
            Viewed 63 times
        
    0
            
            
        1 Answers
3
            Is that what you want?
In [26]: df = pd.DataFrame(np.random.rand(5,2), columns=list('ab'))
In [27]: df
Out[27]:
          a         b
0  0.513723  0.886019
1  0.197956  0.172094
2  0.131495  0.476552
3  0.678821  0.106523
4  0.440118  0.802589
In [28]: arr = df.values
In [29]: arr
Out[29]:
array([[ 0.51372311,  0.88601887],
       [ 0.19795635,  0.17209383],
       [ 0.13149478,  0.47655197],
       [ 0.67882124,  0.10652332],
       [ 0.44011802,  0.80258924]])
In [30]: df['c'] = arr.tolist()
In [31]: df
Out[31]:
          a         b                                           c
0  0.513723  0.886019    [0.5137231110962795, 0.8860188692834928]
1  0.197956  0.172094  [0.19795634688449892, 0.17209383434042336]
2  0.131495  0.476552  [0.13149477867656167, 0.47655196508193576]
3  0.678821  0.106523   [0.6788212365523125, 0.10652331756477551]
4  0.440118  0.802589   [0.44011802077658635, 0.8025892383754725]
Timing for 5M rows DF:
In [36]: big = pd.concat([df] * 10**6, ignore_index=True)
In [38]: big.shape
Out[38]: (5000000, 2)
In [39]: arr = big.values
In [40]: %timeit arr.tolist()
1 loop, best of 3: 2.27 s per loop
In [41]: %timeit list(arr)
1 loop, best of 3: 3.62 s per loop
        MaxU - stand with Ukraine
        
- 205,989
 - 36
 - 386
 - 419
 
- 
                    1perhaps `df.values.tolist()` will be better: http://stackoverflow.com/a/40593934/3765319 – Kartik Nov 17 '16 at 19:00
 - 
                    @Kartik, it's a good point, thank you!! I've corrected my answer – MaxU - stand with Ukraine Nov 17 '16 at 19:06
 - 
                    Out of curiosity, is `df.as_matrix()[:, :2]` in a lamba function worthwhile or would the list approach work best? – anshanno Nov 17 '16 at 19:07
 - 
                    @anshanno, it's an interesting idea! Could you please add it as your own answer? – MaxU - stand with Ukraine Nov 17 '16 at 19:13