I have a simple tagged union of values. The values can either be int64_ts or doubles. I am performing addition on the these unions with the caveat that if both arguments represent int64_t values then the result should also have an int64_t value.
Here is the code:
#include<stdint.h>
union Value {
  int64_t a;
  double b;
};
enum Type { DOUBLE, LONG };
// Value + type.
struct TaggedValue {
  Type type;
  Value value;
};
void add(const TaggedValue& arg1, const TaggedValue& arg2, TaggedValue* out) {
  const Type type1 = arg1.type;
  const Type type2 = arg2.type;
  // If both args are longs then write a long to the output.
  if (type1 == LONG && type2 == LONG) {
    out->value.a = arg1.value.a + arg2.value.a;
    out->type = LONG;
  } else {
    // Convert argument to a double and add it.
    double op1 = type1 == LONG ? (double)arg1.value.a : arg1.value.b; // Why isn't CMOV used?
    double op2 = type2 == LONG ? (double)arg2.value.a : arg2.value.b; // Why isn't CMOV used? 
    out->value.b = op1 + op2;
    out->type = DOUBLE;
  }
}
The output of gcc at -O2 is here: http://goo.gl/uTve18 Attached here in case the link doesn't work.
add(TaggedValue const&, TaggedValue const&, TaggedValue*):
    cmp DWORD PTR [rdi], 1
    sete    al
    cmp DWORD PTR [rsi], 1
    sete    cl
    je  .L17
    test    al, al
    jne .L18
.L4:
    test    cl, cl
    movsd   xmm1, QWORD PTR [rdi+8]
    jne .L19
.L6:
    movsd   xmm0, QWORD PTR [rsi+8]
    mov DWORD PTR [rdx], 0
    addsd   xmm0, xmm1
    movsd   QWORD PTR [rdx+8], xmm0
    ret
.L17:
    test    al, al
    je  .L4
    mov rax, QWORD PTR [rdi+8]
    add rax, QWORD PTR [rsi+8]
    mov DWORD PTR [rdx], 1
    mov QWORD PTR [rdx+8], rax
    ret
.L18:
    cvtsi2sd    xmm1, QWORD PTR [rdi+8]
    jmp .L6
.L19:
    cvtsi2sd    xmm0, QWORD PTR [rsi+8]
    addsd   xmm0, xmm1
    mov DWORD PTR [rdx], 0
    movsd   QWORD PTR [rdx+8], xmm0
    ret
It produced code with a lot of branches. I know that the input data is pretty random i.e it has a random combination of int64_ts and doubles. I'd like to have at least the conversion to a double done with an equivalent of a CMOV instruction. Is there any way I can coax gcc to produce that code? I'd ideally like to run some benchmark on real data to see how the code with a lot of branches does vs one with fewer branches but more expensive CMOV instructions. It might turn out that the code generated by default by GCC works better but I'd like to confirm that. I could inline the assembly myself but I'd prefer not to.
The interactive compiler link is a good way to check the assembly. Any suggestions?
 
    