I have a DataFrame that has been put together using different source files, and in some instances there are duplicated entries in the files, yet the amounts differ.
The below table is a representation of the data I am dealing with
| index | name | recipient | amount | file | year | 
|---|---|---|---|---|---|
| 0 | company a | recipient a | 1003 | File A | 2020 | 
| 1 | company a | recipient a | 1000 | File B | 2020 | 
| 2 | company b | recipient b | 500 | File A | 2018 | 
| 3 | company a | recipient a | 760 | File A | 2017 | 
In the above table, rows at index 0 & 1 are duplicates, and I want to drop the row at index 1
I haven't been able to get very far with writing any code, though I have an idea sketched out. It may not be very efficient, as the idea centers on looping through the rows.
At each row, if the following conditions are met:
- Row 0 (recipient) = Row 1 (recipient)
- Row 0 (year) = Row 1 (year)
- Row 0 (file) != Row 1 (file)
- (Row 0 (Amount) - Row 1 (Amount)) / Row 0 (amount) < 0.05
Then drop Row 1
The data will be sorted so that the check will always be done for the row below.
What is an effective way to apply the above logic?
**Update
I've tried to replicate the answer from Pandas compare next row based on conditions
recipient = df['recipient']
amount = df['amount']
file = df['file']
year = df['year']
for i in range(len(recipient)-1):
    print(i)
    if(recipient[i] == recipient[i+1] and year[i] == year[i+1] and file[i] != file[i+1] and ((amount[i] - amount[i+1])/amount[i]) < 0.05):
        del recipient[i]
        del amount[i]
        del file[i]
        del year[i]
        
new_df = pd.DataFrame({'recipient': recipient, 'amount': amount, 'file': file, 'year': year})
Except I am getting a KeyError: 10 error message
