I want to do a calculation based on two data files. The calculcation is memory-heavy, so I cannot do them all at once. I split the job into 200 pieces, and then run the calculation on the pieces, which are later combined.
I automated this in a Makefile:
.PHONY: SPLITS QOAC
.SECONDARY: QOAC SPLITS
NSETS = 200
DSETS := $(patsubst %,cache/split_%.rds,$(shell seq 1 1 $(NSETS)))
QSETS := $(patsubst %,cache/qoac_%.rds,$(shell seq 1 1 $(NSETS)))
QOAC: $(QSETS)
SPLITS: $(DSETS)
$(DSETS): split_files.R data/1 data/2
    Rscript $< $(NSETS)
cache/qoac_%.rds: calc_qoac.R cache/split_%.rds
    Rscript $^
bigfile: combine.R QOAC
    Rscript $<
In this example, NSETS pieces are generated by split_files.R, which reads data/1 and data/2. The sets are saved in cache/split_*.rds.
For every split_*, qoac_* is computed using calc_qoac.R. As these processes are isolated, they can be run in parallel by running make -j.
My problem is that if 1(+) of the split_* is missing, split_files.R is run multiple times. 
When I add .NOTPARALLEL: SPLITS, the entire script is run serially, which slowes things down.
How can I make sure the generation of the sets is done only once when needed?
