I've been having trouble with understanding the performance characteristics of using Func<...> throughout my code when using inheritance and generics - which is a combination I find myself using all the time.
Let me start with a minimal test case so we all know what we're talking about, then I'll post the results and then I'm going to explain what I would expect and why...
Minimal test case
public class GenericsTest2 : GenericsTest<int>
{
static void Main(string[] args)
{
GenericsTest2 at = new GenericsTest2();
at.test(at.func);
at.test(at.Check);
at.test(at.func2);
at.test(at.Check2);
at.test((a) => a.Equals(default(int)));
Console.ReadLine();
}
public GenericsTest2()
{
func = func2 = (a) => Check(a);
}
protected Func<int, bool> func2;
public bool Check2(int value)
{
return value.Equals(default(int));
}
public void test(Func<int, bool> func)
{
using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
{
for (int i = 0; i < 100000000; ++i)
{
func(i);
}
}
}
}
public class GenericsTest<T>
{
public bool Check(T value)
{
return value.Equals(default(T));
}
protected Func<T, bool> func;
}
public class Stopwatch : IDisposable
{
public Stopwatch(Action<TimeSpan> act)
{
this.act = act;
this.start = DateTime.UtcNow;
}
private Action<TimeSpan> act;
private DateTime start;
public void Dispose()
{
act(DateTime.UtcNow.Subtract(start));
}
}
The results
Took 2.50s -> at.test(at.func);
Took 1.97s -> at.test(at.Check);
Took 2.48s -> at.test(at.func2);
Took 0.72s -> at.test(at.Check2);
Took 0.81s -> at.test((a) => a.Equals(default(int)));
What I would expect and why
I would have expect this code to run at exactly the same speed for all 5 methods, to be more precise, even faster than any of this, namely just as fast as:
using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
{
for (int i = 0; i < 100000000; ++i)
{
bool b = i.Equals(default(int));
}
}
// this takes 0.32s ?!?
I expected it to take 0.32s because I don't see any reason for the JIT compiler not to inline the code in this particular case.
On closer inspection, I don't understand these performance numbers at all:
at.funcis passed to the function and cannot be changed during execution. Why isn't this inlined?at.Checkis apparently faster thanat.Check2, while both cannot be overridden and the IL of at.Check in the case of class GenericsTest2 is as fixed as a rock- I see no reason for
Func<int, bool>to be slower when passing an inlineFuncinstead of a method that's converted to aFunc - And why is the difference between test case 2 and 3 a whopping 0.5s while the difference between case 4 and 5 is 0.1s - aren't they supposed to be the same?
Question
I'd really like to understand this... what is going on here that using a generic base class is a whopping 10x slower than inlining the whole lot?
So, basically the question is: why is this happening and how can I fix it?
UPDATE
Based on all the comments so far (thanks!) I did some more digging.
First off, a new set of results when repeating the tests and making the loop 5x larger and executing them 4 times. I've used the Diagnostics stopwatch and added more tests (added description as well).
(Baseline implementation took 2.61s)
--- Run 0 ---
Took 3.00s for (a) => at.Check2(a)
Took 12.04s for Check3<int>
Took 12.51s for (a) => GenericsTest2.Check(a)
Took 13.74s for at.func
Took 16.07s for GenericsTest2.Check
Took 12.99s for at.func2
Took 1.47s for at.Check2
Took 2.31s for (a) => a.Equals(default(int))
--- Run 1 ---
Took 3.18s for (a) => at.Check2(a)
Took 13.29s for Check3<int>
Took 14.10s for (a) => GenericsTest2.Check(a)
Took 13.54s for at.func
Took 13.48s for GenericsTest2.Check
Took 13.89s for at.func2
Took 1.94s for at.Check2
Took 2.61s for (a) => a.Equals(default(int))
--- Run 2 ---
Took 3.18s for (a) => at.Check2(a)
Took 12.91s for Check3<int>
Took 15.20s for (a) => GenericsTest2.Check(a)
Took 12.90s for at.func
Took 13.79s for GenericsTest2.Check
Took 14.52s for at.func2
Took 2.02s for at.Check2
Took 2.67s for (a) => a.Equals(default(int))
--- Run 3 ---
Took 3.17s for (a) => at.Check2(a)
Took 12.69s for Check3<int>
Took 13.58s for (a) => GenericsTest2.Check(a)
Took 14.27s for at.func
Took 12.82s for GenericsTest2.Check
Took 14.03s for at.func2
Took 1.32s for at.Check2
Took 1.70s for (a) => a.Equals(default(int))
I noticed from these results, that the moment you start using generics, it gets much slower. Digging a bit more into the IL I found for the non-generic implementation:
L_0000: ldarga.s 'value'
L_0002: ldc.i4.0
L_0003: call instance bool [mscorlib]System.Int32::Equals(int32)
L_0008: ret
and for all the generic implementations:
L_0000: ldarga.s 'value'
L_0002: ldloca.s CS$0$0000
L_0004: initobj !T
L_000a: ldloc.0
L_000b: box !T
L_0010: constrained. !T
L_0016: callvirt instance bool [mscorlib]System.Object::Equals(object)
L_001b: ret
While most of this can be optimized, I suppose the callvirt can be a problem here.
In an attempt to make it faster I added the 'T : IEquatable' constraint to the definition of the method. The result is:
L_0011: callvirt instance bool [mscorlib]System.IEquatable`1<!T>::Equals(!0)
While I understand more about the performance now (it probably cannot inline because it creates a vtable lookup), I'm still confused: Why doesn't it simply call T::Equals? After all, I do specify it will be there...