5

I know similar questions have been asked but I think my case is a little bit diffrent.

Let's say I have a computer with 8 cores and infinite memory with a Linux OS.

I have a calculation software called Gaussian that can take advantage of multithreading. So I set its thread count to 8 for a single calculation for maximum speed. However I really can't decide what to do when I need to do run for instance 8 calculations simultaneously. In that case should I set the thread count to 1(total 8 threads spawned in 8 processes) or keep it 8(total 64 threads spawned in 8 processes) for each job? Does it really matter much? A related question is does the OS automatically does the core-parking to diffrent cores for each thread?

EDIT: I know the benchmarking is the best way to know. The thing is, the computers belong to my university so they are busy all the time. In other words, its workload varies in an uncontrollable way for me because other people are using these computers for their calculations too, making experimenting impossible. Also the software is very expensive(1500$ or something) and licensed for each computer, thus I can't simply run a benchmark on my personal computer...

gunakkoc
  • 109

4 Answers4

6

The correct number depends on how much time the processes spend blocked on IO.

The book "Programming Concurrency on the JVM" has some good information about this:

"Determining the Number of Threads". For a large problem, we'd want to have at least as many threads as the number of available cores. This will ensure that as many cores as available to the process are put to work to solve our problem...

So the minimum number of threads is equal to the number of available cores. If all tasks are computation intensive, then this is all we need. Having more threads will actually hurt in this case because cores would be context switching between threads when there is still work to do. If tasks are IO intensive, then we should have more threads.

When a task performs an IO operation, its thread gets blocked. The processor immediately context switches to run other eligable threads. If we had only as many threads as the number of available cores, even though we have tasks to perform, they can't run because we haven't scheduled them on threads for the processors to pick up.

If tasks spend 50 percent of the time being blocked, then the nubmer of threads should be twice the number of available cores. If they spend less time being blocked--that is, they're computation intensive--then we should have fewer threads but no less than the number of cores. If they spend more time being blocked--that is, they're IO intensive--then we should have more threads, specifically, several multiples of the number of cores.

So we can compute the total number of threads we'd need as follows:

Number of threads = Number of Available Cores / (1 - Blocking Coefficient)

If you need to run multiple calculation simultaneously, maybe see if it is possible to run them within one process with a thread pool that is sized appropriately.

Otherwise, if you have the optimal number of threads for one calculation, but then run 8 at a time you may have too many.

The best solution is to benchmark it experimentally.

I'm not exactly sure what you mean by core parking, but the CPU will tend to keep running the same thread on a given core for cache reasons, though it will also move it around sometimes for different heat/power reasons. You can investigate this by using a tool like htop.

5

Ideally the total thread count for all the jobs should be the number of cores of the system, except on systems that support hyper-threading, in which it should be twice the number of cores. So if the system doesn't have hyper-threading, there are 8 calculations running, each should run in one thread.

Many Intel processors come with hyper-threading, so each core can support two threads. For example an 8 core system which supports hyper-threading should have 16 threads to utilize the system fully.

3

The answer depends on what the process does and how its multi-threading was programmed, meaning that you will need to experiment.

If the process uses semaphores and other exclusion mechanisms for contention between the threads on common resources (such as memory), then the fewer is the number of threads in the process, the fewer is the number of conflicts that will cause waits.

During a wait the thread does nothing, so waits will have a negative effect on throughput. In this case, more processes and less threads per process will improve throughput, so 8x8 will have better performance than 1x64.

On the other hand, if each thread is totally isolated and there are no shared common resources, then the operating system will schedule the threads without any distinction between the two cases of 8x8 or 1x64. In this case only the total number of threads is important for the total throughput, so both cases are of equal performance.

harrymc
  • 498,455
1

You have answer the question yourselves. "the computers belong to my university so they are busy all the time"

You actually only get a slice of the processors. To get the job done in most efficient way, the overhead of tasking switching and multiplex, and resources waiting should be minimize, Thus you should always consider to do it single thread.

Multi-threading always less efficient when calculated based on "processing power" because of the context switching overhead. It only speed up the problems for utilizing all the "free" unoccupied resources. idea: use 8 computer to run a problem in probably 7.9 times faster, which can never be over 8.

If all these are dedicated to you, just do it in parallel to speed up, if not, keep it single thread and let others use the remaining core for other work.

by the way, in a selfish way, there is a red hat tools which call grid that can split you job to all the linux over the campus. (>200). It will run so fast, just don't get caught, since it will slow down everyone. or use the old tools, mathlab parallel.