I need to sort large binary file of size M, using t threads. Records in file are all equal size. The task explicitly says that the amount of memory I can allocate is m, and is much smaller than M. Also hard drive is guaranteed to have at least 2 * M free space. This calls for merge sort ofc, but turned out it's not so obvious. I see three different approaches here:
A. Map files input, temp1 and temp2 into memory. Perform merge sort input -> temp1 -> temp2 -> temp1 ... until one of temps sorted. Threads only contend for selecting next portion of work , no contention on read/write.
B. fopen 3 files t times each, each thread gets 3 FILE pointers, one per file. Again they contend only for next portion of work, reads and writes should work in parallel.
C. fopen 3 files one time each, keep them under mutexes, all threads work in parallel but to grab more work or to read or to write they lock respective mutex.
Notes:
In real life I would choose A for sure. But doesn't it defeat the whole purpose of having limited buffer? (In other words isn't it cheating?). With such approach I can even radix sort whole file in place without extra buffer. Also this solution is Linux-specific, I think Linux is implied from conversation, but it's not stated explicitly in task description.
Regarding B, I think it works on Linux but isn't portable, see Linux note above.
Regarding C, it's portable but I am not sure how to optimize it (e.g. 8 threads with small enough m will just bump waiting their turn in queue, then read/write tiny portion of data, then instantly sort it and bump into each other again. IMO unlikely to work faster than 1 thread).
Questions:
- Which solution is a better match for the task?
- Which solution is a better design in real life (assuming Linux)?
- Does B work? In other words is opening file multiple times and writing in parallel (to different parts of it) legal?
- Any alternative approaches?
