Why does this threaded code run so much slower?

Question

I'm writing an N-Body simulation, and for computational simplification I've divided the whole space into a number of uniformly-sized regions.

For each body, I compute the force of all other bodies in the same region, and for the other regions I aggregate the mass and distances together so there's less work to be done.

I have a List<Region> and Region defines public void Index() which sums the total mass at this iteration.

I have two variants of my Space.Tick() function:

public void Tick()
{
  foreach (Region r in Regions)
    r.Index();
}

This is very quick. For 20x20x20 = 8000 regions with 100 bodies each = 800000 bodies in total, it only takes about 0.1 seconds to do this. The CPU graph shows 25% utilisation on my quad-core, which is exactly what I would expect.

Now I write this multi-threaded variant:

public void Tick()
{
  Thread[] threads = new Thread[Environment.ProcessorCount];

  foreach (Region r in Regions)
    while (true)
    {
      bool queued = false;

      for (int i = 0; i < threads.Length; i++)
        if (threads[i] == null || !threads[i].IsAlive)
        {
          Region s = r;
          threads[i] = new Thread(s.Index);
          threads[i].Start();
          queued = true;
          break;
        }

      if (queued)
        break;
    }
}

So a quick explanation in case it's not obvious: threads is an array of 4, in the case of my CPU. It starts off being 4xnull. For each region, I loop through all 4 Thread objects (which could be null). When I find one that's either null or isn't IsAlive, I queue up the Index() of that Region and Start() it. I set queued to true so that I can tell that the region has started indexing.

This code takes about 7 seconds. That's 70x slower. I understand that there's a bit of overhead involved with setting up the threads, finding a thread that's vacant, etc. But I would still expect that I would have at least some sort of performance gain.

What am I doing wrong?

Why are you rolling your own workqueue/task system when .NET has about 4 or 5 already? — siride, Nov 14 '12 at 01:59
FWIW, your problem is probably that you are busy-waiting for the next thread, which consumes a lot of CPU time. — siride, Nov 14 '12 at 02:00
You can't get 25% utilization for code that takes 0.1 sec. That's 2.5%. Check your time measurement. — Hans Passant, Nov 14 '12 at 02:06
BTW, you're using about the slowest multi-threading API available in .NET. — John Saunders, Nov 14 '12 at 02:14

tostringtheory · Accepted Answer · 2012-11-14T03:01:44.153

3

Why not try PLINQ?

Regions.AsParallel().ForAll(x=>x.Index());

PLINQ is usually SUPER fast for me, and it scales dependent on your environment.. If it shouldn't be Parallel, it does single thread.

So, if you had to have a multidimensional array come into the function, you could just do this:

 Regions.AsParallel().Cast<Region>().ForAll(x=>x.Index());

edited Nov 14 '12 at 03:01

answered Nov 14 '12 at 02:18

tostringtheory

877
8
19

My `Regions.AsParallel()` doesn't have `ForAll()`. I've got System.Linq, System.Threading, and System.Threading.Tasks - what else am I missing? – Ozzah Nov 14 '12 at 02:32
The other thing is my Regions isn't `List`, it's `Region[,,]` – Ozzah Nov 14 '12 at 02:42
So the PLINQ extensions were added in .Net 4 - http://msdn.microsoft.com/en-us/library/dd383744.aspx . They are simply in the System.Linq namespace from the System.Core assembly. What does intellisense give you after you type the "AsParallel()." method? – tostringtheory Nov 14 '12 at 02:47
Ahh, I found the problem. If I put all my Region objects into a `List` then it works. As I said, I had an array before. – Ozzah Nov 14 '12 at 02:51
Yep, and I just found out that apparently only single dimensioned arrays implement IEnumerable, above that, not... Thanks for the problem! [.net multidimensional arrays not implementing IEnumerable](http://stackoverflow.com/questions/275073/why-does-c-sharp-multidimensional-arrays-not-implement-ienumerable) – tostringtheory Nov 14 '12 at 02:55

Why does this threaded code run so much slower?

1 Answers1