Calli opcode requires a calling convention. By default it is stdcall, while extern "C" in native libraries uses cdecl.
JIT recently allowed to inline methods with calli, but only with default calling convention. When I call a method with calli without unmanaged cdecl it works on x64 and performance is 58% faster than DllImport and 2.2x faster than unmanaged function pointer. (on netcoreapp2.1, on net471 the difference is bigger: 82% and 5.5x ) When I run a method with calli unmanaged cdecl, performance is on par with DllImport (around 1% slower).
I have read that on x64 there is no longer a mess with stdcall vs cdecl and all methods use cdecl (or fastcall, seen that in another place, cannot find a link). The difference only applies to x86, where my call without unmanaged cdecl does indeed crash the app with segfault.
The method in question is the following. For tests I use noop native method only to measure native call overhead.
.method public hidebysig static int32 CalliCompress(uint8* source, native int sourceLength, uint8* destination, native int destinationLength, int32 clevel, native int functionPtr) cil managed aggressiveinlining
{
.custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor()
= {}
//
.maxstack 6
ldarg.0
ldarg.1
ldarg.2
ldarg.3
ldarg 4
ldarg 5
calli unmanaged cdecl int32 (uint8* source, native int sourceLength, uint8* destination, native int destinationLength, int32 clevel)
ret
}
My questions:
1) Is it safe to omit unmanaged cdecl after calli on x64 "by design" or I am just lucky with this example? If on x64 all calls are cdecl then I could use JIT treating static readonly fields as constants dispatch to appropriate methods for free just using if(IntPtr.Size == 8) {..call fast method..}else{..use unmanaged cdecl..}
2) What does caller or callee cleans the stack mean? My native function returns an int that is on the stack after the call. Is this the issue about who removes this int from the stack? Or there is some other work needs to be done with stack inside native function? I am in control of native function and could return the value via a ref parameter - will this make the issue with the stack cleaning irrelevant since no stack changes are made during the call?