Sorry if this question seems overly noob to you. I've taken programming courses but never one on computer architecture. I had to pretty much learn from Wiki/SO/Google.
I have a dict called LUT, and I need to parallelize its lookup (READ-ONLY). I have a list of items that I am scattering to multiple threads/processes, and each thread/process will then lookup LUT[item] for each item in its respective chopped-up list.
I can only think of 7 options to achieve this:
1. multithreading module, all threads lookup the same dict
2. multiprocessing module, all processes lookup the same dict
3. multiprocessing module, all processes lookup their own copy of dict, e.g. if there are 2 processes, there are 2 copies of the dict
4. multiprocessing module, all processes lookup a "shared proxy dict": Manager.dict
The following 3 options use Cython since I've heard it can be used to overcome Python's GIL.
5. Cython & C++'s STL unordered_map and multithreading, all threads lookup the same unordered_map
6. Cython & C++'s STL unordered_map and multiprocessing, all processes lookup the same unordered_map
7. Cython & C++'s STL unordered_map and multiprocessing, all processes lookup their respective copy of the unordered_map
I have already tried options 2, 3, & 4. 2 & 4 are around 100-1000x slower than serial lookup. Option 3 works well, but its memory usage is too high, since it makes use of multiple copies of the dictionary.
Options 5, 6, & 7 use Cython and its ability to extend with C++'s STL unordered_map, which is the C++-equivalent of Python's dict. Option 5 should technically overcome Python's GIL, but I am wondering if multithreading can really solve something that is CPU-bound. What is my best bet here?