to explain Comments More, start with x86 calling convention and your code.
x86 Calling Convention
In x86, arguments are located in stack. So basically your function call is x86 way. for example, If you build your code for x86,
[SECTION .data]
   msg: db "Hello C",0
[SECTION .bss]
[SECTION .text] 
 extern puts
 global main 
 main:
  push ebp
  mov ebp, esp
  and esp, 0xfffffff0
  sub esp, 0x10
  mov DWORD PTR [esp], msg
  call puts
  mov esp, ebp
  pop ebp
  ret
It may works fine.
x86-64 Calling Convention
Main difference is two things.
- using 8 bytes to represent address, of course
- use 6 registeres (rdi, rsi, rdx, rcx, r8, r9) for represent first 6 arguments (rest is located in stack)
so first, you should change push dword msg to mov rdi, msg, and don't clean stack after call (because you didn't push anything to stack) 
after change:
[SECTION .data]
   msg: db "Hello C",0
[SECTION .bss]
[SECTION .text] 
 extern puts
 global main 
 main:
  push rbp
  mov rbp, rsp
  and rsp, 0xfffffffffffffff0
  mov rdi, msg
  call puts
  mov rsp, rbp
  pop rbp
  ret
EDIT: from System V ABI, for call instruction stack be should 16-byte aligned. so push rbp has effect to alignment, but it is not correct purpose to use. to change that, make stack save logic for both x86 and x86-64.