Assume we have a DF with duplicates in their respected UserID's but with different namings, which of course can be duplicates as well.
DF <- data.frame(ID=c(101,101,101,101,101,102,102,102,102),
Name=c("Ed","Ed","Hank","Hank","Hank","Sandy","Sandy","Jessica","Jessica"),
Class=c("Junior","Junior","Junior","Junior", "Junior","High","High","Mid","Mid"),
Scoring=c(11,15,18,18,12,20,22,25,26), Other_Scores=c(15,9,34,23,43,23,34,23,23))
The aim is to aggregate and calculate the mean and standard deviation of the UserID's and their names respectively. A desired output example:
UserID  Name     Class    Scoring_mean  Scoring_std
101     Ed       Junior   12.5          3
101     Hank     Junior   24.67         11.62
102     Sandy    High     24.75         6.29
102     Jessica  High     24.25         1.5
Hence my question:
- What are the options to aggregate the Names based on the UserID, without the loss of information (Hank being coerced into Ed etc. as with summarise() or mutate() )
In my way of thinking, R has to check which Name corresponds to the UserID, and if a match; aggregate and calculate mean & standard deviation, but I'm not able to get this working in R with dplyr.
At the same time I couldn't find any other post that is somewhat related to this question, as in:
 
     
    