Just say we run a study where participants are measured on some outcome variable four times each. At the start of testing they provide their age and sex. Here is some toy data to illustrate.
set.seed(1)
sex <- NA
age <- NA
df <- data.frame(id = factor(rep(1:4,each=4)),
                 time = rep(1:4,times=4),
                 sex = as.vector(sapply(0:3, function(i) sex[i*4 + 1:4] <- c(sample(c("m", "f"), 1, replace = T), rep(NA,3)))),
                 age = as.vector(sapply(0:3, function(i) age[i*4 + 1:4] <- c(sample(18:75, 1, replace = T), rep(NA,3)))),
                 outcome = round(rnorm(16),2),
                 stringsAsFactors = F)
Here is what the data looks like
df
# output
# id time  sex age outcome
#  1    1    m  29    0.33
#  1    2 <NA>  NA   -0.82
#  1    3 <NA>  NA    0.49
#  1    4 <NA>  NA    0.74
#  2    1    m  70    0.58
#  2    2 <NA>  NA   -0.31
#  2    3 <NA>  NA    1.51
#  2    4 <NA>  NA    0.39
#  3    1    f  72   -0.62
#  3    2 <NA>  NA   -2.21
#  3    3 <NA>  NA    1.12
#  3    4 <NA>  NA   -0.04
#  4    1    f  56   -0.02
#  4    2 <NA>  NA    0.94
#  4    3 <NA>  NA    0.82
#  4    4 <NA>  NA    0.59
Now what I want to do is to use the tidyverse to apply the values for the demographic variables, at present only on the first row of each participant's data, to all the rows.
At present all I could come up with was
df %>% group_by(id) %>% # group by id
       distinct(sex) %>% # shrink to unique values for each id
       dplyr::filter(!is.na(sex)) %>% # remove the NAs
       left_join(df, by = "id")
Which yields the output
# A tibble: 16 x 6
# Groups:   id [4]
#    sex.x id     time sex.y   age outcome
#    <chr> <fct> <int> <chr> <int>   <dbl>
#  1 m     1         1 m        29   0.33 
#  2 m     1         2 NA       NA  -0.82 
#  3 m     1         3 NA       NA   0.49 
#  4 m     1         4 NA       NA   0.74 
#  5 m     2         1 m        70   0.580
#  6 m     2         2 NA       NA  -0.31 
#  7 m     2         3 NA       NA   1.51 
#  8 m     2         4 NA       NA   0.39 
#  9 f     3         1 f        72  -0.62 
# 10 f     3         2 NA       NA  -2.21 
# 11 f     3         3 NA       NA   1.12 
# 12 f     3         4 NA       NA  -0.04 
# 13 f     4         1 f        56  -0.02 
# 14 f     4         2 NA       NA   0.94 
# 15 f     4         3 NA       NA   0.82 
# 16 f     4         4 NA       NA   0.59 
Now I would consider this partially successful because the first row in each participant's sex.x column has now been applied to all their other rows, but I really don't like that there are now two sex columns.
Now I could easily add some more functions to the chain that remove the superfluous sex.y column and rename the sex.x column to its original form, but this seems a bit clunky.
Can anyone suggest how to do this better?
 
    