df is a minimal code example dataframe which has partially redundant rows.
                 Start                  End   MeterS   MeterE  Value    Type  Amount Unit
0  2021-06-15T09:59:50  2021-06-15T13:01:04  2188950  2196310     24  type_A    1.00   pc
1  2021-06-15T09:59:50  2021-06-15T13:01:04  2188950  2196310     24  type_B    3.02    h
2  2021-06-15T09:59:50  2021-06-15T13:01:04  2188950  2196310     24  type_C    7.36    k
3  2021-06-14T09:57:35  2021-06-14T14:47:35  2181600  2188950     87  type_A    1.00   pc
4  2021-06-14T09:57:35  2021-06-14T14:47:35  2181600  2188950     87  type_B    4.83    h
5  2021-06-14T09:57:35  2021-06-14T14:47:35  2181600  2188950     87  type_C    7.35    k
I want to create a new dataframe based on df, where:
- Datetime is unique
- columns Type,AmountandUnitwill be removed, respectively changed to:Amount_Type_A,Amount_Type_BandAmount_Type_Cwith the associated data ( example:Amount_Type_Bshould have3.02in row 0, and4.83in row 1 )
Code df:
import pandas as pd
columns = ['Start', 'End', 'MeterS', 'MeterE', 'Value', 'Type', 'Amount', 'Unit']
data = {
            'Start': ['2021-06-15T09:59:50', '2021-06-15T09:59:50', '2021-06-15T09:59:50', '2021-06-14T09:57:35', '2021-06-14T09:57:35', '2021-06-14T09:57:35'],
            'End': ['2021-06-15T13:01:04', '2021-06-15T13:01:04', '2021-06-15T13:01:04', '2021-06-14T14:47:35', '2021-06-14T14:47:35', '2021-06-14T14:47:35'],
            'MeterS': [2188950, 2188950, 2188950, 2181600, 2181600, 2181600],
            'MeterE': [2196310, 2196310, 2196310, 2188950, 2188950, 2188950],
            'Value': [24, 24, 24, 87, 87, 87],
            'Type': ['type_A', 'type_B', 'type_C', 'type_A', 'type_B', 'type_C'],
            'Amount': [1, 3.02, 7.36, 1, 4.83, 7.35],
            'Unit': ['pc', 'h', 'k', 'pc', 'h', 'k']
        }
df = pd.DataFrame(data, columns=columns)
I tried this:
columns2 = ['Start', 'End', 'MeterS', 'MeterE', 'Value', 'Amount_Type_A', 'Amount_Type_B', 'Amount_Type_C']
data2 = {
            'Start': [df.Start.drop_duplicates(keep = 'first')],
            'End': [df.End.drop_duplicates(keep = 'first')],
            'MeterS': [df.MeterS.drop_duplicates(keep = 'first')],
            'MeterE': [df.MeterS.drop_duplicates(keep = 'first')],
            'Value': [df.Value.drop_duplicates(keep = 'first')],
            'Amount_Type_A': df.loc[df.Type == 'type_A'].Amount,
            'Amount_Type_B': df.loc[df.Type == 'type_B'].Amount,
            'Amount_Type_C': df.loc[df.Type == 'type_C'].Amount
        }
df2 = pd.DataFrame(data2, columns=columns2)
But i get this error, and I don't understand why:
Traceback (most recent call last):
  File "/Users/stan/Documents/Python/Testing/abstraction_df_from_new_df_grouping.py", line 39, in <module>
    df2 = pd.DataFrame(data2, columns=columns2)
  File "/Users/stan/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 468, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/Users/stan/opt/anaconda3/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 254, in init_dict
    index = extract_index(arrays[~missing])
  File "/Users/stan/opt/anaconda3/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 411, in extract_index
    raise ValueError(msg)
ValueError: array length 1 does not match index length 6
How can I fix this?
