I have some input CSV from which I am building a second dataframe based on certain conditions. Part of the CSV looks like this:
Input
│       Date       │ Instrument │  Value  │
├──────────────────┼────────────┼─────────┤
│ 2018-09-01 00:05 │ Foo        │ -0.3552 │
│ 2018-09-01 00:05 │ Bar        │ -0.4423 │
│ 2018-09-01 00:05 │ Foo        │ -0.6511 │
│ 2018-09-01 00:05 │ Bar        │ -0.8773 │
│ 2018-09-01 00:15 │ Foo        │  0.0045 │
│ 2018-09-01 00:15 │ Bar        │  0.0904 │
│ 2018-09-01 00:20 │ Foo        │  0.6691 │
│ 2018-09-01 00:20 │ Bar        │  0.7401 │
In this step, I am trying to create a dataframe indexed by time that should look like this:
Output
│                  │ Column_Foo │ Column_Bar │
├──────────────────┼────────────┼────────────┤
│ 2018-09-01 00:05 │    -1.0063 │    -1.3196 │
│ 2018-09-01 00:15 │     0.0045 │     0.0904 │
│ 2018-09-01 00:20 │     0.6691 │     0.7401 │
Essentially, I want to sum together any values that occur on the same datetime in the CSV input.
I can use newDf.loc[index, column] = value to add value to a new cell in location index, column, but I see no way of appending value to the number in the cell if it already contains a number.
newDf.loc[index, column] += value fails with a KeyError if the index or column doesn't exist yet (even if you put it inside a pd.isnull()!). And if they both do exist but the cell is empty, newDf.loc[index, column] returns NaN which together with += just yields NaN.
My attempt
My best attempt so far, which works but I really hope there's a cleaner way to do this.
def awfulWayOfDoingThings(row, out):
    if row.Date in out.index and "Column_" + row.Instrument in out.columns:
        if np.isnan(out.at[row.Date, "Column_" + row.Instrument]):
            out.at[row.Date, "Column_" + row.Instrument] = 0.0
        out.at[row.Date, "Column_" + row.Instrument] += row.Value
    else:
        out.at[row.Date, "Column_" + row.Instrument] = row.Value
newDf = pd.DataFrame()
inputDf.apply(awfulWayOfDoingThings, axis=1, args=[newDf])
I essentially have to check:
- Does the index exist already?
- Does the column exist already?
- Is the cell NaN?
I really think I must be getting something wrong here and that there must be a significantly simpler way to do this.
What I would like to do
def lovelyWayOfDoingThings(row, out):
    out.at[row.Date, "Column_" + row.Instrument] += row.Value
newDf = pd.DataFrame()
inputDf.apply(lovelyWayOfDoingThings, axis=1, args=[newDf])
