I'm testing what sort of speedup I can get from using SIMD instructions with RyuJIT and I'm seeing some disassembly instructions that I don't expect. I'm basing the code on this blog post from the RyuJIT team's Kevin Frei, and a related post here. Here's the function:
static void AddPointwiseSimd(float[] a, float[] b) {
int simdLength = Vector<float>.Count;
int i = 0;
for (i = 0; i < a.Length - simdLength; i += simdLength) {
Vector<float> va = new Vector<float>(a, i);
Vector<float> vb = new Vector<float>(b, i);
va += vb;
va.CopyTo(a, i);
}
}
The section of disassembly I'm querying copies the array values into the Vector<float>. Most of the disassembly is similar to that in Kevin and Sasha's posts, but I've highlighted some extra instructions (along with my confused annotations) that don't appear in their disassemblies:
;// Vector<float> va = new Vector<float>(a, i);
cmp eax,r8d ; <-- Unexpected - Compare a.Length to i?
jae 00007FFB17DB6D5F ; <-- Unexpected - Jump to range check failure
lea r10d,[rax+3]
cmp r10d,r8d
jae 00007FFB17DB6D5F
mov r11,rcx ; <-- Unexpected - Extra register copy?
movups xmm0,xmmword ptr [r11+rax*4+10h ]
;// Vector<float> vb = new Vector<float>(b, i);
cmp eax,r9d ; <-- Unexpected - Compare b.Length to i?
jae 00007FFB17DB6D5F ; <-- Unexpected - Jump to range check failure
cmp r10d,r9d
jae 00007FFB17DB6D5F
movups xmm1,xmmword ptr [rdx+rax*4+10h]
Note the loop range check is as expected:
;// for (i = 0; i < a.Length - simdLength; i += simdLength) {
add eax,4
cmp r9d,eax
jg loop
so I don't know why there are extra comparisons to eax. Can anyone explain why I'm seeing these extra instructions and if it's possible to get rid of them.
In case it's related to the project settings I've got a very similar project that shows the same issue here on github (see FloatSimdProcessor.HwAcceleratedSumInPlace() or UShortSimdProcessor.HwAcceleratedSumInPlaceUnchecked()).