The two functions model.matrix and data.matrix behave differently in several ways, including what happens if there are NA values, and how non-numeric variables are handled. See the help pages.
By default, entire rows are deleted in the presence of NA when using model.matrix. In data.matrix, these are kept and contribute to cor(use = "pairwise.complete.obs") observations, if not the entire rows are NA. This explains the different correlation coefficients.
If you have to use model.matrix, you could set the option to pass NA values (see solution here) and handle NA values in cor(use="pairwise.complete.obs").
Get data
library(tidyverse)
df <- data.frame(
idcode = c(1:10),
contract = c(TRUE,FALSE,FALSE,FALSE,NA,NA,TRUE,TRUE,FALSE,TRUE),
score = c (1.17, 5, 7.2, 6.6, 3, 3.8, 7.2, 9.1, 5.4, 2.21),
CEO = c(FALSE,NA,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE))
Note that logical variables should be coded without "", but the results will look the same here.
Default behaviour of model.matrix
If there are NA values, model.matrix drops the entire row while data.matrix keeps them. This is due to the default options()$na.action, which is set to na.omit and which only affecs model.matrix.
options()$na.action
#[1] "na.omit"
model.matrix(~0 + ., data = df)
#> idcode contractFALSE contractTRUE score CEOTRUE
#> 1 1 0 1 1.17 0
#> 3 3 1 0 7.20 1
#> 4 4 1 0 6.60 1
#> 7 7 0 1 7.20 1
#> 8 8 0 1 9.10 1
#> 9 9 1 0 5.40 1
#> 10 10 0 1 2.21 1
#> attr(,"assign")
#> [1] 1 2 2 3 4
#> attr(,"contrasts")
#> attr(,"contrasts")$contract
#> [1] "contr.treatment"
#>
#> attr(,"contrasts")$CEO
#> [1] "contr.treatment"
data.matrix(df)
#> idcode contract score CEO
#> [1,] 1 2 1.17 1
#> [2,] 2 1 5.00 NA
#> [3,] 3 1 7.20 2
#> [4,] 4 1 6.60 2
#> [5,] 5 NA 3.00 2
#> [6,] 6 NA 3.80 2
#> [7,] 7 2 7.20 2
#> [8,] 8 2 9.10 2
#> [9,] 9 1 5.40 2
#> [10,] 10 2 2.21 2
Behaviour with na.action = "na.pass"
# set na.action options
oldpar <- options()$na.action
options(na.action ="na.pass")
model.matrix(~0 + ., data = df)
#> idcode contractFALSE contractTRUE score CEOTRUE
#> 1 1 0 1 1.17 0
#> 2 2 1 0 5.00 NA
#> 3 3 1 0 7.20 1
#> 4 4 1 0 6.60 1
#> 5 5 NA NA 3.00 1
#> 6 6 NA NA 3.80 1
#> 7 7 0 1 7.20 1
#> 8 8 0 1 9.10 1
#> 9 9 1 0 5.40 1
#> 10 10 0 1 2.21 1
#> attr(,"assign")
#> [1] 1 2 2 3 4
#> attr(,"contrasts")
#> attr(,"contrasts")$contract
#> [1] "contr.treatment"
#>
#> attr(,"contrasts")$CEO
#> [1] "contr.treatment"
data.matrix(df)
#> idcode contract score CEO
#> [1,] 1 2 1.17 1
#> [2,] 2 1 5.00 NA
#> [3,] 3 1 7.20 2
#> [4,] 4 1 6.60 2
#> [5,] 5 NA 3.00 2
#> [6,] 6 NA 3.80 2
#> [7,] 7 2 7.20 2
#> [8,] 8 2 9.10 2
#> [9,] 9 1 5.40 2
#> [10,] 10 2 2.21 2
Compare correlation coefficients
data.matrix(df) %>% cor(use="pairwise.complete.obs") %>% round(digit=3)
#> idcode contract score CEO
#> idcode 1.000 0.312 0.177 0.625
#> contract 0.312 1.000 -0.226 -0.354
#> score 0.177 -0.226 1.000 0.548
#> CEO 0.625 -0.354 0.548 1.000
model.matrix(~0+., data=df) %>% cor(use="pairwise.complete.obs") %>% round(digit=3)
#> idcode contractFALSE contractTRUE score CEOTRUE
#> idcode 1.000 -0.312 0.312 0.177 0.625
#> contractFALSE -0.312 1.000 -1.000 0.226 0.354
#> contractTRUE 0.312 -1.000 1.000 -0.226 -0.354
#> score 0.177 0.226 -0.226 1.000 0.548
#> CEOTRUE 0.625 0.354 -0.354 0.548 1.000
Note that the two functions handle logical variables data differently (model.matrix creates two dummy variables for contract, and one dummy variable for CEO (see discussion in the comments section to this Answer), data.matrix creates a single binary integer variable), which is reflected in the correlation matrix.
reset default options
options(na.action = oldpar)
Session Info
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] knitr_1.33 magrittr_2.0.1 rlang_0.4.11 fastmap_1.1.0
#> [5] fansi_0.5.0 stringr_1.4.0 styler_1.5.1 highr_0.9
#> [9] tools_4.1.1 xfun_0.25 utf8_1.2.2 withr_2.4.2
#> [13] htmltools_0.5.2 ellipsis_0.3.2 yaml_2.2.1 digest_0.6.27
#> [17] tibble_3.1.4 lifecycle_1.0.0 crayon_1.4.1 purrr_0.3.4
#> [21] vctrs_0.3.8 fs_1.5.0 glue_1.4.2 evaluate_0.14
#> [25] rmarkdown_2.10 reprex_2.0.1 stringi_1.7.4 compiler_4.1.1
#> [29] pillar_1.6.2 backports_1.2.1 pkgconfig_2.0.3
Created on 2021-09-19 by the reprex package (v2.0.1)