I have a dataframe that looks like this
from pandas import Timestamp
df = pd.DataFrame({'inventory_created_date': [Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00')],
                  'rma_processed_date': [Timestamp('2017-09-25 00:00:00'),
                                         Timestamp('2018-01-08 00:00:00'),
                                         Timestamp('2018-04-21 00:00:00'),
                                         Timestamp('2018-08-10 00:00:00'),
                                         Timestamp('2018-10-17 00:00:00'),
                                         Timestamp('2018-11-08 00:00:00'),
                                         Timestamp('2019-07-18 00:00:00'),
                                         Timestamp('2020-01-30 00:00:00'),
                                         Timestamp('2020-04-20 00:00:00'),
                                         Timestamp('2020-06-09 00:00:00')], 
                  'uniqueid':['9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959'],
                  'rma_created_date':[Timestamp('2017-07-31 00:00:00'),
                                     Timestamp('2017-12-12 00:00:00'),
                                     Timestamp('2018-04-03 00:00:00'),
                                     Timestamp('2018-07-23 00:00:00'),
                                     Timestamp('2018-09-28 00:00:00'),
                                     Timestamp('2018-10-24 00:00:00'),
                                     Timestamp('2019-06-21 00:00:00'),
                                     Timestamp('2019-12-03 00:00:00'),
                                     Timestamp('2020-04-03 00:00:00'),
                                     Timestamp('2020-05-18 00:00:00')],
                  'time_in_weeks':[50, 69, 85, 101, 110, 114, 148, 172, 189, 196],
                  'failure_status':[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]})
I need to adjust the time_in_weeks numbers for every row after the first. What I need to do is for each row after the first I need to take the rma_created_date and the date rma_processed_date above that row and find the number of weeks between them.
For example, in the second row we have rma_created_date of 2017-12-12 and we have 'rma_processed_date' of 2017-09-25 in the first row. Thus the number of weeks in between these two dates is 11. There fore the 69 in the second row should become an 11.
Lets for another example. On the third row we have rma_created_date of 2018-04-03 and an rma_processed_date in the second row of 2018-01-08. Thus the number of weeks in between these two dates is 12. Therefore the 85 in the third row should become an 12.
This is what I have done so far
def clean_df(df):
    '''
    This function will fix the time_in_weeks column to calculate the correct number of weeks
    when there is multiple failured for an item.
    '''
    
    # Sort by rma_created_date
    df = df.sort_values(by=['rma_created_date'])
    
    # Convert date columns into datetime
    df['inventory_created_date'] = pd.to_datetime(df['inventory_created_date'], errors='coerce')
    df['rma_processed_date'] = pd.to_datetime(df['rma_processed_date'], errors='coerce')
    df['rma_created_date'] = pd.to_datetime(df['rma_created_date'], errors='coerce')
    
    # If we have rma_processed_dates that are of 1/1/1900 then just drop that row
    df = df[~(df['rma_processed_date'] == '1900-01-01')]
    
    # Correct the time_in_weeks column
    df['time_in_weeks']=np.where(df.uniqueid.duplicated(keep='first'),df.rma_processed_date.dt.isocalendar().week.sub(df.rma_processed_date.dt.isocalendar().week.shift(1)),df.time_in_weeks)
    return df
df = clean_df(df)
When I apply this function to the example, this is what I get
df = pd.DataFrame({'inventory_created_date': [Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00'),
                                             Timestamp('2016-08-17 00:00:00')],
                  'rma_processed_date': [Timestamp('2017-09-25 00:00:00'),
                                         Timestamp('2018-01-08 00:00:00'),
                                         Timestamp('2018-04-21 00:00:00'),
                                         Timestamp('2018-08-10 00:00:00'),
                                         Timestamp('2018-10-17 00:00:00'),
                                         Timestamp('2018-11-08 00:00:00'),
                                         Timestamp('2019-07-18 00:00:00'),
                                         Timestamp('2020-01-30 00:00:00'),
                                         Timestamp('2020-04-20 00:00:00'),
                                         Timestamp('2020-06-09 00:00:00')], 
                  'uniqueid':['9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959',
                             '9907937959'],
                  'rma_created_date':[Timestamp('2017-07-31 00:00:00'),
                                     Timestamp('2017-12-12 00:00:00'),
                                     Timestamp('2018-04-03 00:00:00'),
                                     Timestamp('2018-07-23 00:00:00'),
                                     Timestamp('2018-09-28 00:00:00'),
                                     Timestamp('2018-10-24 00:00:00'),
                                     Timestamp('2019-06-21 00:00:00'),
                                     Timestamp('2019-12-03 00:00:00'),
                                     Timestamp('2020-04-03 00:00:00'),
                                     Timestamp('2020-05-18 00:00:00')],
                  'time_in_weeks':[50, 4294967259, 14, 16, 10, 3, 4294967280, 4294967272, 12, 7],
                  'failure_status':[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]})
Obviously the calculation is incorrect, which leads me to believe there must be something wrong with this
df['time_in_weeks']=np.where(df.uniqueid.duplicated(keep='first'),df.rma_processed_date.dt.isocalendar().week.sub(df.rma_processed_date.dt.isocalendar().week.shift(1)),df.time_in_weeks)
If anyone has any suggestions I would greatly appreciate it.
The time_in_weeks column is expected to be [50, 11, 12, 13, 7, 1, 32, 20, 9, 4]
 
    