Dismantle dataframe into new dataframes of subsets/groups resp. create new dataframes of data subsets/groups from other dataframe

Question

I have a pandas dataframe that looks like the following and holds groups of data via a column id:

import numpy as np
import pandas as pd


df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['id'] = ['W', 'W', 'W', 'Z', 'Z', 'Y', 'Y', 'Y', 'Z', 'Z']

print(df)

          A         B         C         D id
0  0.347501 -1.152416  1.441144 -0.144545  w
1  0.775828 -1.176764  0.203049 -0.305332  w
2  1.036246 -0.467927  0.088138 -0.438207  w
3 -0.737092 -0.231706  0.268403  0.464026  x
4 -1.857346 -1.420284 -0.515517 -0.231774  x
5 -0.970731  0.217890  0.193814 -0.078838  y
6 -0.318314 -0.244348  0.162103  1.204386  y
7  0.340199  1.074977  1.201068 -0.431473  y
8  0.202050  0.790434  0.643458 -0.068620  z
9 -0.882865  0.687325 -0.008771 -0.066912  z

Now I want to create new dataframes (named df_w, df_x, df_y, df_z) which only hold their data from the original dataframe and are optimally combined within some iterable e.g. a list:

df_w

          A         B         C         D id
0  0.347501 -1.152416  1.441144 -0.144545  w
1  0.775828 -1.176764  0.203049 -0.305332  w
2  1.036246 -0.467927  0.088138 -0.438207  w

df_x

          A         B         C         D id
0 -0.737092 -0.231706  0.268403  0.464026  x
1 -1.857346 -1.420284 -0.515517 -0.231774  x

df_y

          A         B         C         D id
0 -0.970731  0.217890  0.193814 -0.078838  y
1 -0.318314 -0.244348  0.162103  1.204386  y
2  0.340199  1.074977  1.201068 -0.431473  y

df_z

          A         B         C         D id
0  0.202050  0.790434  0.643458 -0.068620  z
1 -0.882865  0.687325 -0.008771 -0.066912  z

Is there any smart (vectorized pandas) way to achieve this using groupby, apply and/or applymap and a function?

I was thinking about iterating over the dataframe but it doesn't seem to be very elegant..

Thanks in advance for any hints!

MaxU - stand with Ukraine · Accepted Answer · 2017-08-08T12:00:46.963

we can create a dict of DFs:

In [166]: dfs = {k:v for k,v in df.groupby('id')}

In [168]: dfs.keys()
Out[168]: dict_keys(['W', 'Y', 'Z'])

In [169]: dfs['W']
Out[169]:
          A         B         C         D id
0 -0.373021 -0.555218  0.022980 -0.512323  W
1 -1.599466  0.637292  0.045059 -0.334030  W
2  0.100659  0.557068  0.142226 -0.186214  W

In [170]: dfs['Y']
Out[170]:
          A         B         C         D id
5  0.540107 -0.739077  0.992408  2.010203  Y
6 -0.201376 -0.913222 -0.173284  1.837442  Y
7 -1.367659  0.915360  0.072720 -0.886071  Y

In [171]: dfs['Z']
Out[171]:
          A         B         C         D id
3 -0.329087  0.842431  0.839319 -0.597823  Z
4 -0.594375 -0.950486  1.125584  0.116599  Z
8  0.366667 -0.978279 -1.449893  0.192451  Z
9 -0.007439 -0.084612  0.010192 -0.417602  Z

UPDATE: with reset index:

In [177]: {k:v.reset_index(drop=True) for k,v in df.groupby('id')}
Out[177]:
{'W':           A         B         C         D id
 0 -0.373021 -0.555218  0.022980 -0.512323  W
 1 -1.599466  0.637292  0.045059 -0.334030  W
 2  0.100659  0.557068  0.142226 -0.186214  W,
 'Y':           A         B         C         D id
 0  0.540107 -0.739077  0.992408  2.010203  Y
 1 -0.201376 -0.913222 -0.173284  1.837442  Y
 2 -1.367659  0.915360  0.072720 -0.886071  Y,
 'Z':           A         B         C         D id
 0 -0.329087  0.842431  0.839319 -0.597823  Z
 1 -0.594375 -0.950486  1.125584  0.116599  Z
 2  0.366667 -0.978279 -1.449893  0.192451  Z
 3 -0.007439 -0.084612  0.010192 -0.417602  Z}

Resetting the index here via `v.reset_index(inplace=True)` would generate the desired result. Thanks a lot! Incredible how fast you are.. — Cord Kaldemeyer, Aug 08 '17 at 11:59
`{k:v.reset_index(drop=True) for k,v in df.groupby('id')}` for zero-starting indexes. — Zero, Aug 08 '17 at 11:59

jezrael · Answer 2 · 2017-08-08T12:02:49.117

I think the best is create dict by convert groupby object to tuples and then to dict:

#for index starts from 0
df.index = df.groupby('id').cumcount()

dfs = dict(tuple(df.groupby('id')))
print (dfs)
{'W':           A         B         C         D id
0  1.331587  0.715279 -1.545400 -0.008384  W
1  0.621336 -0.720086  0.265512  0.108549  W
2  0.004291 -0.174600  0.433026  1.203037  W, 'Y': A   B         C         D id
0 -1.977728 -1.743372  0.266070  2.384967  Y
1  1.123691  1.672622  0.099149  1.397996  Y
2 -0.271248  0.613204 -0.267317 -0.549309  Y, 'Z': A   B         C         D id
0 -0.965066  1.028274  0.228630  0.445138  Z
1 -1.136602  0.135137  1.484537 -1.079805  Z
2  0.132708 -0.476142  1.308473  0.195013  Z
3  0.400210 -0.337632  1.256472 -0.731970  Z}

print (dfs['Y'])
          A         B         C         D id
0 -1.977728 -1.743372  0.266070  2.384967  Y
1  1.123691  1.672622  0.099149  1.397996  Y
2 -0.271248  0.613204 -0.267317 -0.549309  Y

For interesting is possible use custom DataFrame names by globals, but better is dict:

for i, df in df.groupby('id'):
     globals()['df_' + i] = df.reset_index(drop=True)

print (df_Y)
          A         B         C         D id
0 -1.977728 -1.743372  0.266070  2.384967  Y
1  1.123691  1.672622  0.099149  1.397996  Y
2 -0.271248  0.613204 -0.267317 -0.549309  Y

Wow, this was incredibly fast. I like the solution a lot! Is there any way to also reset the index "on the fly" to let the subframes start with "0"? — Cord Kaldemeyer, Aug 08 '17 at 11:57

Dismantle dataframe into new dataframes of subsets/groups resp. create new dataframes of data subsets/groups from other dataframe

2 Answers2

Linked

Related