I have a large database table (~1 million records) which I need to purge the duplicate records. The table structure is as follows:
|----|-------------|-----|-----|--------------------|
| id | relation_id | foo | bar | timestamp          |
|----|-------------|-----|-----|--------------------|
| 1  | 1           |14.20|0.22 |2019-10-21 14:00:01 |
| 2  | 1           |14.20|0.22 |2019-10-21 14:00:01 |
| 3  | 1           |14.20|0.22 |2019-10-21 14:00:01 |
| 4  | 2           |10.36|0.75 |2019-10-21 14:00:01 |
| 5  | 2           |10.36|0.75 |2019-10-21 14:00:01 |
| 6  | 2           |10.36|0.75 |2019-10-21 14:00:01 |
|----|-------------|-----|-----|--------------------|
As per the example above, there are a lot of records that have the exact same combination of values relation_id, foo, bar and timestamp. I need to create a script that will run to identify the unique values and then delete and duplicate references. So I would end up with something like:
|----|-------------|-----|-----|--------------------|
| id | relation_id | foo | bar | timestamp          |
|----|-------------|-----|-----|--------------------|
| 1  | 1           |14.20|0.22 |2019-10-21 14:00:01 |
| 4  | 2           |10.36|0.75 |2019-10-21 14:00:01 |
|----|-------------|-----|-----|--------------------|
I have tested looping through the relation_id (as there are only 20 unique values) and then running something like this to create a collection of the unique records:
$unique     = collect([]);
$collection = Model::where('relation_id', $relation_id)->chunk(100, function($items) use ($unique) {
    $unique->push($items->unique()->values()->all());
});
From that, I had planned to loop through all of the Model records and delete if the item was not within the $unique collection. Something like this:
Model::chunk(100, function($items) {
    foreach ($items as $item) {
        if(!$unique->contains('id', $item->id)){
            $item->delete;
        }
    }
});
My problem is as the database table is so large, I cannot test if this logic works. Running the first part of the above script (to populate $unique) for a single $relation_id ran in tinker for 30 minutes without yielding results.
I'm relatively confident this isn't the best approach to delete duplicate records as my approach requires multiple queries which I assume could be optimised (which is critical when dealing with such a large table).
So what is the most efficient way to query a database table to check for unique records (based on multiple columns) and delete the duplicate records?
 
    