So given the following c program:
#include <stdio.h>
int main() {
    int a = 3;
    printf("hello world %d\n", a);
}
Running it with clang x86-64 v6.0 produces the following assembly with no optimisations:
main: # @main
  pushq %rbp
  movq %rsp, %rbp
  subq $16, %rsp
  movabsq $.L.str, %rdi
  movl $3, -4(%rbp)
  movl -4(%rbp), %esi
  movb $0, %al
  callq printf
  xorl %esi, %esi
  movl %eax, -8(%rbp) # 4-byte Spill
  movl %esi, %eax
  addq $16, %rsp
  popq %rbp
  retq
.L.str:
  .asciz "hello world %d\n"
And I noticed that with this C program
#include <stdio.h>
int main() {
    double a = 3;
    printf("hello world %f\n", a);
}
The program produces the similar assembly:
.LCPI0_0:
  .quad 4613937818241073152 # double 3
main: # @main
  pushq %rbp
  movq %rsp, %rbp
  subq $16, %rsp
  movabsq $.L.str, %rdi
  movsd .LCPI0_0(%rip), %xmm0 # xmm0 = mem[0],zero
  movsd %xmm0, -8(%rbp)
  movsd -8(%rbp), %xmm0 # xmm0 = mem[0],zero
  movb $1, %al    ; <--- this
  callq printf
  xorl %ecx, %ecx ; <--- this
  movl %eax, -12(%rbp) # 4-byte Spill
  movl %ecx, %eax
  addq $16, %rsp
  popq %rbp
  retq
.L.str:
  .asciz "hello world %f\n"
however there are two differences:
- the xmm0 SSE register things are used - i understand this is to do with floating point
 - we xor ECX rather than RSI after the call
 - and AL is set to 1
 
What do these differences mean?