I'm trying to automate my data cleaning process. My dataset looks like this:
ADDRESS             PHONE    TYPE
123 Willow Street   7429947  RESIDENTIAL
123 Willow Street   7426629  RESIDENTIAL
234 Butter Road     7564123  RESIDENTIAL
It's quite large - several hundred thousand rows. I'd like to be able to do the following thing:
(1) Duplicate Detection, so I can eliminate the "nearly"-duplicate rows.
(2) Create a new column for the non-duplicated data - something like PHONE 2. The issue is that I cannot know beforehand whether or not there are only 2 duplicate rows - could be n.  
The outcome would hopefully be something like this:
ADDRESS             PHONE   PHONE 2    TYPE
123 Willow Street   7429947  7426629    RESIDENTIAL
234 Butter Road     7564123             RESIDENTIAL
I'd love to do this with dplyr, but I'm sort of at a loss as to where to start. Any pointers?