I need some help using r data frames. Here is the data frame:
group   col1    col2    name
1       dog     40      canidae
1       dog     40      canidae
1       dog     40      canidae
1       dog     40      canidae
1       dog     40  
1       dog     40      canidae
1       dog     40      canidae
2       frog    85      dendrobatidae
2       frog    89      leptodactylidae
2       frog    89      leptodactylidae
2       frog    82      leptodactylidae
2       frog    89 
2       frog    81 
2       frog    89      dendrobatidae
3       horse   87      equidae1
3       donkey  76      equidae2
3       zebra   67      equidae3
4       bird    54      psittacidae
4       bird    56  
4       bird    34  
5       bear    67    
5       bear    54
What I would like to get is to add a column "consensus_name" an get :
group col1   col2 name              consensus_name
1     dog    40   canidae           canidae
1     dog    40   canidae           canidae
1     dog    40                     canidae
1     dog    40   canidae           canidae
1     dog    40   canidae           canidae
2     frog   85   dendrobatidae     leptodactylidae
2     frog   89   leptodactylidae   leptodactylidae
2     frog   89   leptodactylidae   leptodactylidae
2     frog   82   leptodactylidae   leptodactylidae
2     frog   89                     leptodactylidae
2     frog   81                     leptodactylidae
2     frog   89   dendrobatidae     leptodactylidae
3     horse  87   equidae1          equidae3
3     donkey 76   equidae2          equidae3
3     zebra  67   equidae3          equidae3
4     bird   54   psittacidae       psittacidae
4     bird   56                     psittacidae
4     bird   34                     psittacidae
5     bear   67                     NA
5     bear   54                     NA
In order to get this new column for each group, I get the name which is the most representative of the group.
- For the - group1there are 4 rows with the name- 'canidae'and one with nothing, so for each one I write- 'canidae'in the column- consensus_name
- For the - group2there are 2 rows with the name- 'dendrobatidae', 2 with nothing and 3 rows with the name- 'leptodactylidae'so for each one I write '- leptodactylidae'in the column- consensus_name.
- For the - group3there are 3 rows with different names, so because there is no consensus, I get the name which as the lowest- col2number, so I write- 'equidae3'in the column- consensus_name.
- For the group 4 only one row have an information, so it is the consensus_name of the - group4, so I write- psittacidaein the column- consensus_name.
- For the - group5there is none informations, then just write NA in the- consensus_namecolumn.
Does anyone have any idea to do it with R ? Thank for your help :)
Here is the df:
structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), col1 = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 3L, 6L, 
1L, 1L, 1L), .Label = c("bird", "dog", "donkey", "frog", "horse", 
"zebra"), class = "factor"), col2 = c(40L, 40L, 40L, 40L, 40L, 
40L, 40L, 85L, 89L, 89L, 82L, 89L, 81L, 89L, 87L, 76L, 67L, 54L, 
56L, 34L), name = structure(c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 
7L, 7L, 7L, 1L, 1L, 3L, 4L, 5L, 6L, 8L, 1L, 1L), .Label = c("", 
"canidae", "dendrobatidae", "equidae1", "equidae2", "equidae3", 
"leptodactylidae", "psittacidae"), class = "factor")), class = "data.frame", row.names = c(NA, 
-20L))
the real one has around 50 000 rows.
 
     
     
     
    