I have a study with several cases, all containing data from multiple ordinal factor variables (genotypes) and multiple numeric variables (various blood samples (concentrations)). I am trying to set up an explorative model to test linearity between any of the numeric variables (dependent in the model) and any of the ordinal factor variables (independent in the model).
Dataset structure example (independent variables): genotypes
case_id   genotype_1   genotype_2   ... genotype_n
1         0            0                1
2         1            0                2
...       ...          ...              ...
n         2            1                0
and dependent variables (with matching case id:s): samples
case_id   sample_1   sample_2   ... sample_n
1         0.3        0.12           6.12
2         0.25       0.15           5.66
...       ...        ...            ...
n         0.44       0.26           6.62
Found one similar example in the forum which doesn't solve the problem:
model <- apply(samples,2,function(xl)lm(xl ~.,data= genotypes))
I can't figure out how to make simple linear regressions that go through any combination of a given set of dependent and independent variables. If using apply family I guess the varying (x) term should be the dependent variable in the model since every dependent variable should test linearity for the same set of independent variables (individually).
Extract from true data:
> genotypes
      case_id genotype_1 genotype_2 genotype_3 genotype_4 genotype_5
 1       1          2          2          1          1          0
 2       2        NaN          1        NaN          0          0
 3       3          1          0          0          0        NaN
 4       4          2          2          1          1          0
 5       5          0          0          0          1        NaN
 6       6          2          2          1          0          0
 7       9          0          0          0          0          1
 8      10          0          0          0        NaN          0
 9      13          0          0          0        NaN          0
10      15        NaN          1        NaN          0          1
> samples
   case_id    sample_1    sample_2     sample_3   sample_4    sample_5
 1       1  0.16092019  0.08814160 -0.087733372  0.1966070  0.09085343
 2       2 -0.21089678 -0.13289427  0.056583528 -0.9077926 -0.27928376
 3       3  0.05102400  0.07724300 -0.212567535  0.2485348  0.52406368
 4       4  0.04823619  0.12697286  0.010063683  0.2265085 -0.20257192
 5       5 -0.04841221 -0.10780329  0.005759269 -0.4092782  0.06212171
 6       6 -0.08926734 -0.19925538  0.202887833 -0.1536070 -0.05889369
 7       9 -0.03652588 -0.18442457  0.204140717  0.1176950 -0.65290133
 8      10  0.07038933  0.05797007  0.082702589  0.2927817  0.01149564
 9      13 -0.14082554  0.26783539 -0.316528107 -0.7226103 -0.16165326
10      15 -0.16650266 -0.35291579  0.010063683  0.5210507  0.04404433
SUMMARY: Since I have a lot of data I want to create a simple model to help me select which possible correlations to look further into. Any ideas out there?
NOTE: I am not trying to fit a multiple linear regression model!
 
    