I'm hoping that this question isn't the same as previous questions but in different words. I've tried using the solutions for previous questions but they haven't worked for me, so bear with me!
So I'm having some trouble with my linear regression model output in R. I'm concerned that the model is using an incorrect referent group as part of the interaction term I've placed into the model and even though I've tried to relevel the individual terms before they're placed into the interaction term, I'm not getting the output I expected.
I have a dataset with a continuous and categorical variables. Let's say that variables A and B are continuous and variables C, D, and E are categorical (0 = No, 1 = Yes). The referent groups for the categorical variables has been set at "No" (0). Here's an example:
ID      A      B      C      D      E
1       53.6   25     No     Yes    No
2       51.1   12     Yes    No     Yes
3       50.9   NA     Yes    Yes    No
4       49.3   2      No     No     No
5       48.1   NA     No     Yes    No
I've tried a couple of different ways to get at the interaction terms, so my models are set up as follows:
lm1 <- lm(A ~ C*D + E + B, data=example)
lm2 <- lm(A ~ C:D + E + B, data=example)
I expected to get an output table listing the regression coefficient, standard error, etc. for the intercept, C alone, D alone, E, B, and then C * D, broken down into 3 of the 4 possible combination groups of that interaction term, minus the combination group that comprised both referent groups ("No" for both C and D, "C_No:D_No").
EXPECTED:
Coefficient   Estimate   Std. Error   t value   Pr(>|t|)   
Intercept     90.76369   0.54308      167.127   < 2e-16  ***
C_Yes         -0.28639   0.62044      -0.462    0.644465    
D_Yes         -3.01242   1.14733      -2.626    0.008771 **
E_Yes         0.05865    0.01691      3.468     0.000544 ***
B             -0.20891   0.35982      -0.581    0.561634
C_No:D_Yes    -0.42116   0.47213      2.617     0.01674  *
C_Yes:D_Yes   2.01208    1.43154      1.406     0.160148
C_Yes:D_No    -0.02877   0.65271      -0.345    0.672531 
For the first model, I got the output for the intercept, C alone, D alone, E, B, and then one combination group of C * D only.
ACTUAL:
Coefficient   Estimate   Std. Error   t value   Pr(>|t|)   
Intercept     90.76369   0.54308      167.127   < 2e-16  ***
C_Yes         -0.28639   0.62044      -0.462    0.644465    
D_Yes         -3.01242   1.14733      -2.626    0.008771 **
E_Yes         0.05865    0.01691      3.468     0.000544 ***
B             -0.20891   0.35982      -0.581    0.561634
C_No:D_Yes    -0.42116   0.47213      2.617     0.01674  *
For the second model, I got the output for the intercept, E, B, and then all combination groups of C * D.
ACTUAL:
Coefficient   Estimate   Std. Error   t value   Pr(>|t|)   
Intercept     90.76369   0.54308      167.127   < 2e-16  ***
E_Yes         0.05865    0.01691      3.468     0.000544 ***
B             -0.20891   0.35982      -0.581    0.561634
C_No:D_Yes    -0.42116   0.47213      2.617     0.01674  *
C_Yes:D_Yes   NA  (all not defined because of singularities)
C_Yes:D_No    -0.02877   0.65271      -0.345    0.672531 
So now my questions are:
1) Is there different code that will give me everything that I want in one model instead of two?
2) Is this model, as is, using C_Yes:D_Yes as the referent group instead of C_No:D_No and that's why I'm getting the error about singularities? My variables are correlated, yes, but not perfectly so I wasn't expecting multicollinearity to be an issue.
3) If the referent group is correct, why am I getting a coefficient estimate for C_No:D_No (the referent group)?
 
    