I have avro data with the following keys: 'id, label, features'. id and label are string while features is a buffer of floats.
import dask.bag as db
avros = db.read_avro('data.avro')
df = avros.to_dataframe()
convert = partial(np.frombuffer, dtype='float64')
X = df.assign(features=lambda x: x.features.apply(convert, meta='float64'))
I eventually end up with this MCVE
  label id         features
0  good  a  [1.0, 0.0, 0.0]
1   bad  b  [1.0, 0.0, 0.0]
2  good  c  [0.0, 0.0, 0.0]
3   bad  d  [1.0, 0.0, 1.0]
4  good  e  [0.0, 0.0, 0.0]
my desired output would be:
  label id   f1   f2   f3
0  good  a  1.0  0.0  0.0
1   bad  b  1.0  0.0  0.0
2  good  c  0.0  0.0  0.0
3   bad  d  1.0  0.0  1.0
4  good  e  0.0  0.0  0.0
I tried some ways that are like pandas, namely df[['f1','f2','f3']] = df.features.apply(pd.Series) did not work like in pandas.
I can traverse with a loop like
for i in range(len(features)):
df[f'f{i}'] = df.features.map(lambda x: x[i])
but in the real use-case I have thousand of features and this traverses the dataset thousands of times.
What would be the best way to achieve the desired outcome?
 
    