I have a repeated measures sample, where each participant was asked to complete a sleep survey over the course of 5 years (baseline though year 4 of follow-up). Each survey item is fairly correlated (e.g. when you go to bed correlates with your duration of sleep) and so we are interested in taking a PCA-like approach and use loadings on each PC to create a time-varying composite score (e.g. a composite score based on these hypothetical "PC" "loadings" for each time point). Then we want to take each participant's time-varying composite measure in a mixed model to predict our longitudinal outcome of interest.
We initially performed PCA on all of the data (all participants and across all time points), but there are a few assumptions here. But, upon further reflection, I started to question if the PCA is able to distinguish between participant- or time-based variability. And so, I am looking for a way to perform a similar dimensionality reduction approach on a repeated measures sample.
Based on some previous stack questions I found, it looks like MFA might be a good option. But all of the examples I see online don't include longitudinal analysis.
1. Does MFA seem like the correct approach?
2. And if so, is the following code correct for library(FactoMineR)
Below is a sample dataset illustrating the structure and code I think I'd run:
library(FactoMineR)
set.seed(123)
ex_dat <- data.frame(ID = rep(1:4, each=4),
visit = rep(c("baseline", "y1", "y2", "y3"), 4),
var1 = rnorm(16),
var2 = rnorm(16)^2,
var3 = log(rnorm(16, mean=3, sd=1)))
dat <- ex_dat %>% pivot_wider(id_cols = ID,
names_from = visit,
values_from = c("var1", "var2", "var3")) %>% data.frame()
> dat
ID var1_baseline var1_y1 var1_y2 var1_y3 var2_baseline var2_y1 var2_y2 var2_y3 var3_baseline var3_y1 var3_y2 var3_y3
1 1 -0.5604756 -0.2301775 1.5587083 0.07050839 0.2478551 3.86758304 0.4919001 0.22353172 1.3597259 1.3553540 1.3406642 1.3052579
2 2 0.1292877 1.7150650 0.4609162 -1.26506123 1.1402475 0.04751306 1.0526851 0.53128242 1.2680506 1.0777591 0.9910409 0.9629945
3 3 -0.6868529 -0.4456620 1.2240818 0.35981383 0.3906741 2.84493432 0.7018871 0.02352331 0.8352078 1.0267878 0.5507789 1.6426707
4 4 0.4007715 0.1106827 -0.5558411 1.78691314 1.2953557 1.57205186 0.1818717 0.08706718 1.4369784 0.6296169 0.9544013 0.9295404
So, for each participant I have multiple measurements of variables 1 through 3.
My hunch, based on the MFA manual would be to run code like this to perform MFA. This, I'm guessing, assumes that all var1, var2, var3 variables are in their own "group".
# MFA Analysis
res_MFA <- MFA(dat[, -1], group=rep(4, 3), type=rep("s", 3))
And lastly... does res_MFA$ind$coord give me the equivalent of a "loading" for each dimension?