I have a large dataframe that I have subset to simplify my question, it looks like this:
genome_ID     cluster  
p1.A2           1        
p1.A2           3         
p1.A2           3          
p1.A2           4          
p1.A3           2          
p1.A4           2          
p1.A5           1          
p1.A5           3
And I would like to add a column 'phages' to the dataframe and add numbers corresponding to how many times the genome_ID is present... ie..
  genome_ID     cluster     phages
    p1.A2           1         1
    p1.A2           3         2
    p1.A2           3         3
    p1.A2           4         4
    p1.A3           2         1 
    p1.A4           2         1
    p1.A5           1         1
    p1.A5           3         2
So as you can see the genome_ID p1.A2 is present four times, so there are now four different groupings in the column phages (1-4). p1.A5 is present twice, so there is now numbering from 1-2. If a genome_ID were present fifty times, I would like the column phages to number each from 1-50 (and the order of numbering doesn't matter)
I need to do this so I can subset my dataset more easily to map it to a phylogeny (a biological tree showing evolutionary relationships)
If someone could give me insight to useful R packages and methods that would be very helpful.
 
     
    