I have written the following very simple code which I am experimenting with in godbolt's compiler explorer:
#include <cstdint>
uint64_t func(uint64_t num, uint64_t den)
{
    return num / den;
}
GCC produces the following output, which I would expect:
func(unsigned long, unsigned long):
        mov     rax, rdi
        xor     edx, edx
        div     rsi
        ret
However Clang 13.0.0 produces the following, involving shifts and a jump even:
func(unsigned long, unsigned long):                              # @func(unsigned long, unsigned long)
        mov     rax, rdi
        mov     rcx, rdi
        or      rcx, rsi
        shr     rcx, 32
        je      .LBB0_1
        xor     edx, edx
        div     rsi
        ret
.LBB0_1:
        xor     edx, edx
        div     esi
        ret
When using uint32_t, clang's output is once again "simple" and what I would expect.
It seems this might be some sort of optimization, since clang 10.0.1 produces the same output as GCC, however I cannot understand what is happening. Why is clang producing this longer assembly?
 
     
    