The crash is caused by stack misalignment.  See Why does the x86-64 / AMD64 System V ABI mandate a 16 byte stack alignment?
The jump-by-ret results in entering runme with the stack misaligned, which violates the ABI, and some libc functions do in fact break when called with a misaligned stack.  It doesn't happen on my system, but apparently your malloc implementation (which printf calls) requires stack alignment.
Disassembling the code bytes, the faulting instruction is movaps [rsp+0x10], xmm1, whose memory operand must be aligned to 16 bytes.  However, rsp has a hex value ending in 8, so rsp+0x10 is not aligned.
I don't off the top of my head see a simple way to have the exploit work around this.
Here is a brief explanation of the principle of stack alignment and how it leads to the crash.
It simply means that when the movaps instruction is executed, the value in rsp is not a multiple of 16 (which is mathematically equivalent to saying that its last hex digit is not 0). The compiler is careful to ensure that it generates code that always adjusts the stack pointer by multiples of 16, such that if it was properly aligned by the caller of this function, then the calls made by this function will also occur with proper alignment.
The rule set out by the x86-64 SysV ABI, which Linux compilers conform to, is that rsp must be a multiple of 16 (i.e. must end in 0) when a call instruction is issued.  This means that when the called function begins to execute, then rsp is 8 less than a multiple of 16 (i.e. ends in 8), because of the 8-byte return address that was pushed by call. So when main reaches its ret instruction, with your modified return address on the stack, rsp likewise ends in 8 (all stack modification done within main has been undone at this point). The ret pops the stack once, so you end up at runme with rsp ending in 0, which is wrong.
This "parity error" propagates down through printf and into malloc. The _int_malloc function expects to be entered with rsp ending in 8, so it presumably subtracts an additional 8 bytes (possibly just by pushing) somewhere before executing movaps.  As such, rsp would end in 0 at that point and all would be well. But since the situation was reversed on entry to runme, it stays reversed. _int_malloc got entered with rsp ending in 0 instead, and so its subtraction of 8 bytes left it not ending in 0 when movaps executed.
To your comment: At the level of C, stack alignment is the job of the compiler, not the programmer. So a C program can freely define a local array of size 17, and the compiler will then have to know to actually adjust the stack pointer by 32 bytes, leaving the other 15 bytes unused (or using them for other local variables). It isn't something that a C programmer normally has to worry about, but it becomes relevant when you are hacking internals like this.