I have a data frame which contains an event history and I want to check its integrity by checking whether the last event for each ID number matches the current value in the system for that ID number. The data are coded as factors. The following toy data frame is a minimal example:
df <-data.frame(ID=c(1,1,1,1,2,2,2,3,3),
                 current.grade=as.factor(c("Senior","Senior","Senior","Senior",
                                         "Junior","Junior","Junior",
                                         "Sophomore","Sophomore")),
                 grade.history=as.factor(c("Freshman","Sophomore","Junior","Senior",
                                   "Freshman","Sophomore","Junior",
                                   "Freshman","Sophomore")))
which gives output
> df
  ID current.grade grade.history
1  1        Senior      Freshman
2  1        Senior     Sophomore
3  1        Senior        Junior
4  1        Senior        Senior
5  2        Junior      Freshman
6  2        Junior     Sophomore
7  2        Junior        Junior
8  3     Sophomore      Freshman
9  3     Sophomore     Sophomore
> str(df)
'data.frame':   9 obs. of  3 variables:
 $ ID           : num  1 1 1 1 2 2 2 3 3
 $ current.grade: Factor w/ 3 levels "Junior","Senior",..: 2 2 2 2 1 1 1 3 3
 $ grade.history: Factor w/ 4 levels "Freshman","Junior",..: 1 4 2 3 1 4 2 1 4
I want to use dplyr to extract the last value in grade.history and check it against current.grade:
df.summary <- df %>%
  group_by(ID) %>%
  summarize(current.grade.last=last(current.grade),
            grade.history.last=last(grade.history))
However, dplyr seems to convert the factors to integers, so I get this:
> df.summary
Source: local data frame [3 x 3]
  ID current.grade.last grade.history.last
1  1                  2                  3
2  2                  1                  2
3  3                  3                  4
> str(df.summary)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   3 obs. of  3 variables:
 $ ID                : num  1 2 3
 $ current.grade.last: int  2 1 3
 $ grade.history.last: int  3 2 4
Note that the values don't line up because the original factors had different level sets. What's the right way to do this with dplyr?
I'm using R version 3.1.1 and dplyr version 0.3.0.2
 
     
    