One idea is to employ fork/join algorithm and group items (files) into batches in order to process them individually.
My suggestion is the following:
- Firstly, filter out all files that do not exist - they occupy resources unnecessarily.
The following pseudo-code demonstrates the algorithm that might help you out:
public static class CustomRecursiveTask extends RecursiveTask<Integer {
private final Analyzer[] analyzers;
private final int threshold;
private final File[] files;
private final int start;
private final int end;
public CustomRecursiveTask(Analyzer[] analyzers,
final int threshold,
File[] files,
int start,
int end) {
this.analyzers = analyzers;
this.threshold = threshold;
this.files = files;
this.start = start;
this.end = end;
}
@Override
protected Integer compute() {
final int filesProcessed = end - start;
if (filesProcessed < threshold) {
return processSequentially();
} else {
final int middle = (start + end) / 2;
final int analyzersCount = analyzers.length;
final ForkJoinTask<Integer> left =
new CustomRecursiveTask(analyzers, threshold, files, start, middle);
final ForkJoinTask<Integer> right =
new CustomRecursiveTask(analyzers, threshold, files, middle + 1, end);
left.fork();
right.fork();
return left.join() + right.join();
}
}
private Integer processSequentially() {
for (int i = start; i < end; i++) {
File file = files[i];
for(Analyzer analyzer : analyzers) { analyzer.analyze(file) };
}
return 1;
}
}
And the usage looks the following way:
public static void main(String[] args) {
final Analyzer[] analyzers = new Analyzer[]{};
final File[] files = new File[] {};
final int threshold = files.length / 5;
ForkJoinPool.commonPool().execute(
new CustomRecursiveTask(
analyzers,
threshold,
files,
0,
files.length
)
);
}
Notice that depending on constraints you can manipulate the task's constructor arguments so that the algorithm will adjust to the amount of files.
You could specify different thresholds let's say depending on the amount of files.
final int threshold;
if(files.length > 100_000) {
threshold = files.length / 4;
} else {
threshold = files.length / 8;
}
You could also specify the amount of worker threads in ForkJoinPool depending on the input amount.
Measure, adjust, modify, you will solve the problem eventually.
Hope that helps.
UPDATE:
If the result analysis is of no interest, you could replace the RecursiveTask with RecursiveAction. The pseudo-code adds auto-boxing overhead in between.