My code is very simple:
global start
extern printf, Sleep
section .data:
    string db 'End: %llu', 0Ah, 'Start: %llu', 0
section .text:
start:
    mfence
    lfence
    rdtsc
    sub rsp, 8
    mov [rsp + 4], eax
    mov [rsp], edx
    mov rcx, 5000 ; Sleep for 5 seconds
    sub rsp, 32
    call Sleep
    add rsp, 32
    rdtscp
    pop rbx
    shl rdx, 32
    mov edx, eax
    ;RDX is end value
    ;RBX is start value
    mov r8, rbx
    mov rcx, string
    sub rsp, 40
    call printf
    add rsp, 40
    xor rax, rax
    ret
I am using the RDTSC instruction to time a piece of code (in this case the WinAPI Sleep() function because it makes things clearer), and the mfence + lfence pair for serialization. I ran the program 3 times and I got this output:
//1
End: 3717167211
Start: 12440347256463305328
//2
End: 2175818097
Start: 5820054112011561610
//3
End: 4070965503
Start: 13954488533004593819
From what I understand, RDTSC should always output increasing results, so I don't get why in test 2 the end is smaller than the start value.
Anyway, my goal is to output the amount of seconds the function actually took to execute. I would guess that I need to take the difference between the end and start values and then divide the result by the CPU frequency, but I don't know how to do that. Can anybody help?
P.S. I don't need answers about external libraries such as the CRT clock() as I already know how to use such functions. My point here is to learn how to use the RDTSC instruction.
