Scenario: I have a dataframe with multiple columns. Four of them (A_Id, FF_Id, Date and Output) are relevant.
Objective: I am trying to extract these columns and create a dataframe with the format Index=A_Id, Columns=F_Id, and the values come from the column Output. Whenever there are two or more values of a given combination of A and F, use the column Date to select the latest available.
Data Sample: Each given A_Id entry has a value in combination with each F_Id entry. a given combination can have 2 or more results with different dates . There are 10 unique F_Id, 7000 A_Id. Here is a subset
FId     Date        Output  AId
628     2020/12/31  Yes     1
629     2020/12/31  No      1
080     2020/12/31  No      1
081     2020/12/31  No      1
628     2020/12/31  Yes     2
629     2020/12/31  No      2
080     2020/12/31  No      2
081     2020/12/31  No      2
628     2021/12/31  Yes     3
629     2021/12/31  Yes     3
080     2021/12/31  No      3
081     2021/12/31  No      3
628     2020/12/31  Yes     14
629     2020/12/31  No      14
080     2020/12/31  No      14
081     2020/12/31  No      14
628     2021/12/31  Yes     14
629     2021/12/31  No      14
080     2021/12/31  No      14
081     2021/12/31  No      14
628     2020/12/31  Yes     15
629     2020/12/31  No      15
080     2020/12/31  No      15
081     2020/12/31  Yes     15
Desired ouptut: A Matrix using A_ID as the index and F_Id as columns. Either in one dataframe with 2d, where the first dimension has Output and the Second has the Dates:
    4628    4629    5080    5081    4628        4629        5080        5081
1   Yes     No      No      No      2020/12/31  2020/12/31  2020/12/31  2020/12/31
2   Yes     No      No      No      2020/12/31  2020/12/31  2020/12/31  2020/12/31
3   Yes     Yes     No      No      2021/12/31  2021/12/31  2021/12/31  2021/12/31
14  Yes     No      No      No      2021/12/31  2021/12/31  2021/12/31  2021/12/31
15  Yes     No      No      Yes     2020/12/31  2020/12/31  2020/12/31  2020/12/31
or two different dataframes, of the same size, one with Output and the other with Dates:
    4628    4629    5080    5081
1   Yes     No      No      No
2   Yes     No      No      No
3   Yes     Yes     No      No
14  Yes     No      No      No
15  Yes     No      No      Yes
    4628        4629        5080        5081
1   2020/12/31  2020/12/31  2020/12/31  2020/12/31
2   2020/12/31  2020/12/31  2020/12/31  2020/12/31
3   2021/12/31  2021/12/31  2021/12/31  2021/12/31
14  2021/12/31  2021/12/31  2021/12/31  2021/12/31
15  2020/12/31  2020/12/31  2020/12/31  2020/12/31
Obs: There are cases like A_Id 14, where the Outputs are available for two dates. In this case, I am trying to select the Output value for the latest available date, and that date for reference.
What I tried 1: First I tried passing the Output and Date columns into the pivot_table function.:
# source df is read from the raw data on an excel spreadsheet
list_aid =[source['A_Id']]
list_fid = [source['F_Id']]
list_answer = [source['Output']]
testdf33 = source[['AgentId','FactorId','Answer','Date']].pivot_table(values=['Answer','Date'],
                                                               index='AgentId', 
                                                               columns='FactorId',
                                                               aggfunc={'Answer': lambda x: ' '.join(x), 'Date': lambda x: ' '.join(x)})
Issue 1: When running this, there is the error, because I am passing a Datestime whereas the function was expecting a string.
TypeError: sequence item 0: expected str instance, Timestamp found
What I tried 2: I forcibly changed the data in the Datecolumn into str format:
testdf11 = source[['AgentId','FactorId','Answer','Date']]
testdf11['Date'] = testdf11['Date'].astype(str)
testdf33 = testdf11.pivot_table(values=['Answer','Date'],
                                index='AgentId',columns='FactorId',
                                aggfunc={'Answer': lambda x: ' '.join(x),
                                         'Date': lambda x: ' '.join(x)})
Obs2: This gives me all the data aggregated, but I only need the latest output/dates, not an aggregation of whatever is available.
Question: How can I output only the Output/Date combination for the latest available date?
 
     
    