From whatever you have described here is approach you must try in multiple thread concept.
You have to create thread which will accept your model and test dataset.
EvaluateThread t1 = new EvaluateThread(threadName,model,testDataset1);
EvaluateThread t2 = new EvaluateThread(threadName,model,testDataset2);
EvaluateThread t3 = new EvaluateThread(threadName,model,testDataset3);
Then create synchronized method so that each thread can access that method independently. 
Something like this 
public synchronized double calculateError(model, dataset){
       // do your stuff for e.g. calculate error
       return error;
}
Finally calculate average of error you get from each thread.
For more info about synchronized method check this link.