I want to compare two excel files that have three columns. One file is from previous day and other file is from today (both have identical column names). How do I compare both to show what has been added in and removed from previous day? Note: I just want to focus on the first column (Stack Name) E.g. if stack name xyz is in 08102022.csv and not in 08112022.csv then I want a table that outputs that and shows me it is a deleted stack. Vice versa if it wasn't there in yesterday's csv and is there today then I want it to output and show me it is a new stack.
newstack_additions = pd.concat([dfprevious,dftoday]).drop_duplicates(subset = ['Stack Name'], keep=False)
print(newstack_additions)
newstack_additions["Stack Change Type"] = "New Stack"
newstack_additions['Last Modified'] = pd.to_datetime(newstack_additions['Last Modified'], format= '%m/%d/%Y-%H:%M:%S')
The above works and shows me only the row that has been added in but logically this is only just returning what doesn't have a duplicate after both dataframes are concatenated. It isn't considering anything as 'new' or 'deleted'.
This is 08102022.csv
                                            Stack Name  ...        Last Modified
0       prod/account/cloudformation/AWSAccountBase.yaml  ...  03/15/2022-02:16:48
1     prod/account/cloudformation/AWSAccountBaseAddi...  ...  03/26/2022-02:57:56
2     prod/account/cloudformation/AWSAccountCloudCus...  ...  03/04/2022-02:14:01
3     prod/account/cloudformation/AWSAccountCloudCus...  ...  09/03/2020-02:11:29
4       prod/account/cloudformation/AWSAccountInfo.yaml  ...  09/03/2020-02:11:29
                                                ...  ...                  ...
3139  prod/v003/util/unix-engineering/SSMAutomationR...  ...  05/16/2020-00:51:32
3140  prod/v003/util/unix-engineering/SSMPetsBSCReme...  ...  05/16/2020-00:51:32
3141  prod/v003/util/unix-engineering/SSMSudoStateMa...  ...  05/16/2020-00:51:32
3142  prod/v003/util/unix-engineering/SudoLambdaDepl...  ...  05/16/2020-00:51:32
3143  prod/v003/util/unix-engineering/linux_bsc_reme...  ...  05/16/2020-00:51:32
[3144 rows x 3 columns]
This is 08112022.csv
           Stack Name  ...        Last Modified
0       prod/account/cloudformation/AWSAccountBase.yaml  ...  03/15/2022-02:16:48
1     prod/account/cloudformation/AWSAccountBaseAddi...  ...  03/26/2022-02:57:56
2     prod/account/cloudformation/AWSAccountCloudCus...  ...  03/04/2022-02:14:01
3     prod/account/cloudformation/AWSAccountCloudCus...  ...  09/03/2020-02:11:29
4       prod/account/cloudformation/AWSAccountInfo.yaml  ...  09/03/2020-02:11:29
                                                ...  ...                  ...
3140  prod/v003/util/unix-engineering/SSMAutomationR...  ...  05/16/2020-00:51:32
3141  prod/v003/util/unix-engineering/SSMPetsBSCReme...  ...  05/16/2020-00:51:32
3142  prod/v003/util/unix-engineering/SSMSudoStateMa...  ...  05/16/2020-00:51:32
3143  prod/v003/util/unix-engineering/SudoLambdaDepl...  ...  05/16/2020-00:51:32
3144  prod/v003/util/unix-engineering/linux_bsc_reme...  ...  05/16/2020-00:51:32
[3145 rows x 3 columns]
and I want (shows that below was not in 08102022.csv and has been added in 08112022.csv):
                                             Stack Name  ...  Stack Change Type
1700  prod/landing-zone/splunk/SplunkDataIngestion.yaml  ...          New Stack
[1 rows x 4 columns]
Similarly, I would like to show what was in 08102022.csv and is no longer in 08112022.csv.
 
     
    