In C++, why do some compilers refuse to put objects consisting of only a double into a register?

Question

In section 20 of Scott Meyer's Effective C++, he states:

some compilers refuse to put objects consisting of only a double into a register

When passing built-in types by value, compilers will happily place the data in registers and quickly send ints/doubles/floats/etc. along. However, not all compilers will treat small objects with the same grace. I can easily understand why compilers would treat Objects differently - to pass an Object by value can be a lot more work than copying data members between the vtable and all the constructors.

But still. This seems like an easy problem for modern compilers to solve: "This class is small, maybe I can treat it differently". Meyer's statement seemed to imply that compilers WOULD make this optimization for objects consisting of only an int (or char or short).

Can someone give further insight as to why this optimization sometimes doesn't happen?

Any recent mainstream compiler won't have any problem with this. If a class consists only of a `double`, it will be treated as such. More: the AMD64 Linux calling convention actually *requires* trivial structures up to 32 bytes to be essentially passed as if each member was passed as a separate parameter (not exactly, but close enough in most cases). I'd say that this problem is definitely a thing of the past. — Matteo Italia, Aug 30 '18 at 23:47
@MatteoItalia Surely if one ABI can specify that single-member structures must be passed as their members, another (old) ABI can specify that all structures are passed by reference, and compilers may be stuck with that old ABI for compatibility reasons. — , Aug 30 '18 at 23:50
@hvd: yep, indeed that's the only case where I think such a problem may arise; the compiler *per se* wouldn't have any problem to be smarter than this - and may indeed be smart when the function doesn't need to adhere fully to the calling convention (e.g. for static functions). OTOH, the most commons of these calling conventions (32 bit x86 cdecl) just put everything on the stack, so it's not like this is specific to structures or floating point. — Matteo Italia, Aug 30 '18 at 23:56
@MatteoItalia this seems like the answer to my question. Effective C++, (3rd edition) was published in 2005. So modern compilers would have to be very careful if they wanted to create object code files that can be linked with older compilers? I must learn more about compilers and calling conventions, it's fascinating stuff. — Ari Sweedler, Aug 31 '18 at 00:14
Nothing unusual though. C++ shared objects are not normally compatible anyway. In fact it is usually recommended to stick to a C interface when building shared objects. When you don't, you must check ABI or you'll get a crash. — spectras, Aug 31 '18 at 00:28
@AriSweedler: Are you asking about cases where a compiler will *never* put an object in a register, or cases where compilers may or may not do so based on how that object gets used? Because it ain't difficult to find ways to use an object that prevent register storage, but that's just as true for basic types as for structs. — Nicol Bolas, Aug 31 '18 at 01:14
@NicolBolas My question is "When a compiler could get away with this easy optimization, why wouldn't it?" — Ari Sweedler, Aug 31 '18 at 16:49
@AriSweedler: That doesn't answer my question. "When" could be any number of times, in any number of circumstances. What particular circumstances are you talking about that constitute "easy optimization"? For example, the answer you accepted only talks about objects used in function calls. It says nothing about cases where you just create an object on the stack and manipulate it directly without passing it to a function. And even then, inlining bypasses calling conventions because there is no function being called. — Nicol Bolas, Aug 31 '18 at 17:11
@NicolBolas I don't have a ton of C++ experience - if you think my question lacks details, then that's probably because my knowledge on the subject lacks details. So I'm afraid I cannot give you a better answer to your question, "what particular circumstances are you talking about?". I was asking broadly (maybe too broadly) about what circumstances this behavior even happens in in the first place. If you wish to share knowledge on the broad subject - what the different circumstances are and why compilers act how they do in them - then I'd be happy to listen and learn :) — Ari Sweedler, Aug 31 '18 at 22:48
@NicolBolas Is it even possible to break calling conventions when you inline something? The reasoning being if you're not calling a function, then you shouldn't have to worry about calling conventions — Ari Sweedler, Aug 31 '18 at 22:51
@AriSweedler: "*Is it even possible to break calling conventions when you inline something?*" That's my point. If the reason that a particular compiler, in a particular circumstance, might not put a particular object in a register is that this object is being passed to some function, once you take that function call away, the object may get put into a register. If the compiler doesn't inline that call, or can't inline it for whatever reason, then the calling convention dictates how that object gets stored. — Nicol Bolas, Aug 31 '18 at 23:06
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/179214/discussion-between-ari-sweedler-and-nicol-bolas). — Ari Sweedler, Aug 31 '18 at 23:19

score 15 · Accepted Answer · answered Aug 31 '18 at 04:56

I found this document online on "Calling conventions for different C++ compilers and operating systems" (updated on 2018-04-25) which has a table depicting "Methods for passing structure, class and union objects".

From the table you can see that if an object contains long double, copy of entire object is transferred to stack for all compilers shown here.

Also from the same resource (with emphasis added):

There are several different methods to transfer a parameter to a function if the parameter is a structure, class or union object. A copy of the object is always made, and this copy is transferred to the called function either in registers, on the stack, or by a pointer, as specified in table 6. The symbols in the table specify which method to use. S takes precedence over I and R. PI and PS take precedence over all other passing methods.

As table 6 tells, an object cannot be transferred in registers if it is too big or too complex. For example, an object that has a copy constructor cannot be transferred in registers because the copy constructor needs an address of the object. The copy constructor is called by the caller, not the callee.

Objects passed on the stack are aligned by the stack word size, even if higher alignment would be desired. Objects passed by pointers are not aligned by any of the compilers studied, even if alignment is explicitly requested. The 64bit Windows ABI requires that objects passed by pointers be aligned by 16.

An array is not treated as an object but as a pointer, and no copy of the array is made, except if the array is wrapped into a structure, class or union.

The 64 bit compilers for Linux differ from the ABI (version 0.97) in the following respects: Objects with inheritance, member functions, or constructors can be passed in registers. Objects with copy constructor, destructor or virtual are passed by pointers rather than on the stack.

The Intel compilers for Windows are compatible with Microsoft. Intel compilers for Linux are compatible with Gnu.

As a further note, all occurrences of `I` in the table have a size limit of 32 bits (the row for limit on number of registers is 2 in the 16-bit case and 1 in all 32-bit cases) — Ben Voigt, Aug 31 '18 at 05:23
Interesting! Some of the distinctions in calling convention seem arbitrary to me - you can pack 4 floats into a XMM reg but not 2 doubles? I know that there're good reasons (but they might be legacy). Time to read up, +1 for sharing that calling conventions pdf. — Ari Sweedler, Aug 31 '18 at 17:15

score 1 · Answer 2 · answered Aug 31 '18 at 04:33

Here is an example showing that LLVM clang with optimization level O3 treats a class with a single double data member just like it was a double:

$ cat main.cpp
#include <stdio.h>
class MyDouble {
public:
    double d;
    MyDouble(double _d):d(_d){}
};
void foo(MyDouble d)
{
    printf("%lg\n",d.d);
}
int main(int argc, char **argv)
{
    if (argc>5)
    {
        double x=(double)argc;
        MyDouble d(x);
        foo(d);
    }
    return 0;
}

When I compile it and view the generated bitcode file, I see that foo behaves as if it operates on a double type input parameter:

$ clang++ -O3 -c -emit-llvm main.cpp
$ llvm-dis main.bc

Here is the relevant part:

; Function Attrs: nounwind uwtable
define void @_Z3foo8MyDouble(double %d.coerce) #0 {
entry:
  %call = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([5 x i8]* @.str, i64 0, i64 0), double %d.coerce)
  ret void
}

See how foo declares its input parameter as double, and moves it around for printing ``as is". Now let's compile the exact same code with O0:

$ clang++ -O0 -c -emit-llvm main.cpp
$ llvm-dis main.bc

When we look at the relevant part, we see that clang uses a getelementptr instruction to access its first (and only) data member d:

; Function Attrs: uwtable
define void @_Z3foo8MyDouble(double %d.coerce) #0 {
entry:
  %d = alloca %class.MyDouble, align 8
  %coerce.dive = getelementptr %class.MyDouble* %d, i32 0, i32 0
  store double %d.coerce, double* %coerce.dive, align 1
  %d1 = getelementptr inbounds %class.MyDouble* %d, i32 0, i32 0
  %0 = load double* %d1, align 8
  %call = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([5 x i8]* @.str, i32 0, i32 0), double %0)
  ret void
}

In C++, why do some compilers refuse to put objects consisting of only a double into a register?

2 Answers2