I have a pandas DataFrame df that looks like:
df =
sample col1 data_value time_stamp
A 1 15 0.5
A 1 45 0.5
A 1 32 0.5
A 2 3 1
A 2 57 1
A 2 89 1
B 1 10 0.5
B 1 20 0.5
B 1 30 0.5
B 2 12 1
B 2 24 1
B 2 36 1
For a given sample and its respective column, I am trying to condense all data values into a numpy array in a new column merged_data to look like:
sample col1 merged_data time_stamp
A 1 [15, 45, 32] 0.5
A 2 [3, 57, 89] 1
B 1 [10, 20, 30] 0.5
B 2 [12, 24, 36] 1
I've tried using df['merged_data] = df.to_numpy() operations and df['merged_data'] = np.array(df.iloc[0:2, :].to_numpy(), but they don't work. All elements in the merged_data column need to be numpy arrays or lists (can easily convert between the two).
Lastly, I need to retain the time_stamp column for each combination of sample and col. How can I include this with the groupby?
Any help or thoughts would be greatly appreciated!