Okay, first things first, the s and d in rsi and rdi stand for source and destination. It may work the other way (as you have it) but you'll upset a lot of CDO people like myself(a) :-)
But, for your actual problem, look here:
end_count:
mov [message], rsi
I assume that's meant to copy the final byte 0x10 into the destination but there are two problems:
message is the start of the buffer, not the position where the byte should go.
- You're copying the multi-byte
rsi variable into there, not the byte you need.
Those two points mean that you're putting some weird value into the first few bytes, as your symptoms suggest.
Perhaps a better way to do it would be as follows:
mov rsi, hello ; as Gordon Moore intended :-)
mov rdi, message
put_str_into_message:
mov al, byte [rsi] ; get byte, increment src ptr.
inc rsi
mov byte [rdi], al ; put byte, increment dst ptr.
inc rdi
cmp al, 10 ; continue until newline.
jne put_str_into_message
ret
For completeness, if you didn't want the newline copied (though this is pretty much what you have now, just with the errant buffer-damaging mov taken away) (b):
put_str_into_message:
mov al, byte [rsi] ; get byte.
cmp al, 10 ; stop before newline.
je stop_str
mov byte [rdi], al ; put byte, increment pointers.
inc rsi
inc rdi
jmp put_str_into_message
stop_str:
ret
(a) CDO is obsessive-compulsive disorder, but with the letters arranged correctly :-)
(b) Or the don't-copy-newline loop can be done more efficiently, while still having a single branch at the bottom.
Looping one byte at a time is still very inefficient (x86-64 has SSE2 which lets you copy and check 16 bytes at a time). Since you have the length as an assemble-time constant hello_len, you could use that to efficiently copy in wide chunks (possibly needing special handling at the end if your buffer size is not a multiple of 16), or with rep movsb.
But this demonstrates an efficient loop structure, avoiding the false dependency of merging a new AL into the bottom of RAX, allowing out-of-order exec to run ahead and "see" the loop exit branch earlier.
strcpy_newline_end:
movzx eax, byte [rsi] ; get byte (without false dependency).
cmp al, 10
je copy_done ; first byte isn't newline, enter loop.
copy_loop: ; do:
mov [rdi], al ; put byte.
inc rsi ; increment both pointers.
inc rdi
movzx eax, byte [rsi] ; get next byte.
cmp al, 10
jne copy_loop ; until you get a newline.
; After falling out of the loop (or jumping here from the top)
; we've loaded but *not* stored the terminating newline
copy_done:
ret
You should also be aware there are other tricks you can use, to save instructions inside the loop, such as addressing one string relative to the other (with an indexed addressing mode for the load, only incrementing one pointer).
However, we won't cover them in detail here as it risks making the answer more complex than it needs to be.