I have a large data file (LMTESTData) that contains internal data and the results of an external assessment. Rather than manually subset, I have tried a number of variants on By and ddply to run a linear regression without success.
colnames(LMTESTData)
 [1] "StudentNumber" "SubjectCode"          "SubjectName"          "ExamMark"    "AssessmentMark"   "U"                "hmkk"            
 [8]  "TESmk"  "Year"
The regression model is lm(hmkk ~  ExamMark + AssessmentMark) for each SubjectCode .
Once the model is working, my next challenge will be to predict hmkk given SubjectCode, ExamMark and AssessmentMark for each StudentNumber.
Dummy Data Set
LMTESTData = data.frame(StudentNumber = 1:100, SubjectCode = c("A","B","C","D","E"),hmkk=rnorm(mean=72, 100),
                ExamMark=rnorm(mean=62, 100),AssessmentMark=rnorm(mean=68, 100))
 
     
    