Consider the following code:
#include <utility>
#include <string>
int bar() {
std::pair<int, std::string> p {
123, "Hey... no small-string optimization for me please!" };
return p.first;
}
(simplified thanks to @Jarod42 :-) ...)
I expect the function to be implemented as simply:
bar():
mov eax, 123
ret
but instead, the implementation calls operator new(), constructs an std::string with my literal, then calls operator delete(). At least - that's what gcc 9 and clang 9 do (GodBolt). Here's the clang output:
bar(): # @bar()
push rbx
sub rsp, 48
mov dword ptr [rsp + 8], 123
lea rax, [rsp + 32]
mov qword ptr [rsp + 16], rax
mov edi, 51
call operator new(unsigned long)
mov qword ptr [rsp + 16], rax
mov qword ptr [rsp + 32], 50
movups xmm0, xmmword ptr [rip + .L.str]
movups xmmword ptr [rax], xmm0
movups xmm0, xmmword ptr [rip + .L.str+16]
movups xmmword ptr [rax + 16], xmm0
movups xmm0, xmmword ptr [rip + .L.str+32]
movups xmmword ptr [rax + 32], xmm0
mov word ptr [rax + 48], 8549
mov qword ptr [rsp + 24], 50
mov byte ptr [rax + 50], 0
mov ebx, dword ptr [rsp + 8]
mov rdi, rax
call operator delete(void*)
mov eax, ebx
add rsp, 48
pop rbx
ret
.L.str:
.asciz "Hey... no small-string optimization for me please!"
My question is: Clearly, the compiler has full knowledge of everything going on inside bar(). Why is it not "eliding"/optimizing the string away? More specifically:
- At the basic level there's the code between then
new()anddelete(), which AFAICT the compiler knows results in nothing useful. - Secondarily, the
new()anddelete()calls themselves. After all, small-string-optimization is allowed by the standard AFAIK, so even though clang/gcc hasn't chosen to use that - it could have; meaning that it's not actually required to callnew()ordelete()there.
I'm particularly interested in what part of this is directly due to the language standard, and what part is compiler non-optimality.