As a Modeler trying to find out what is the best way to handle deletes in SCD Type 2 tables.
As per principle an SCD Type 2 table is to track changes using ETL dates like START_DT and END_DT.
- START_DT will be the date the record is effective from.
 - END_DT will be the date it changed to another form or Null/High Date to denote current version of record.
 - At all point of time, for a Key Combination there will be a Current Version record with END_DT either Null or High Date.
 
Now if a record is deleted from Source, what is the best option from below,
- Have additional column like SRC_DELETE_IND which is set 'N' as default and 'Y' if record is deleted from source.
 
Ex: Record came on 1st Oct
PK_ID START_DT    END_DT     VALUE SRC_DELETE_IND
1     2021-10-01  Null       ABC    N
Record had an update on 2nd Oct
PK_ID START_DT    END_DT     VALUE SRC_DELETE_IND
1     2021-10-01  2021-10-02  ABC    N
1     2021-10-02  Null        XYZ    N
Record got deleted on 3rd Oct
PK_ID START_DT    END_DT     VALUE SRC_DELETE_IND
1     2021-10-01  2021-10-02  ABC    N
1     2021-10-02  Null        XYZ    Y
- Same as 1 but insert new duplicate row when Delete Came.
 
Record got deleted on 3rd Oct
PK_ID START_DT    END_DT     VALUE SRC_DELETE_IND
1     2021-10-01  2021-10-02  ABC    N
1     2021-10-02  2021-10-03  XYZ    N
1     2021-10-03  Null        XYZ    Y
- Instead of SRC_DELETE_IND expire/end date the record
 
Record got deleted on 3rd Oct
PK_ID START_DT    END_DT     VALUE 
1     2021-10-01  2021-10-02  ABC   
1     2021-10-02  2021-10-03  XYZ   
But here we now dont have a Open record left.
Complexity is added if the record reappears in the Source stating as a Incorrect delete. Lets say on 10th For Option 1, Data will look like,
PK_ID START_DT    END_DT     VALUE SRC_DELETE_IND
1     2021-10-01  2021-10-02  ABC    N
1     2021-10-02  Null        XYZ    N --Reversed
FOr Option 2
PK_ID START_DT    END_DT     VALUE SRC_DELETE_IND
1     2021-10-01  2021-10-02  ABC    N
1     2021-10-02  2021-10-03  XYZ    N
1     2021-10-03  Null        XYZ    N  -- Reversed but now row is duplicate
For Option 3
PK_ID START_DT    END_DT     VALUE 
1     2021-10-01  2021-10-02  ABC   
1     2021-10-02  2021-10-03  XYZ   
1     2021-10-10  Null        XYZ  --considered as New since no open record existed. Creates ETL gap
Which option makes more sense and is as per DWH best practices.