How linux assembler multiplies and divides 64 bit numbers?

Question

I had a university supervision assigment to write a function in assembler, which

would take three 32 bit unsigned numbers a, b, d and

return the result (a * b) / d

The c declaration of this function is:

unsigned int muldiv(const unsigned int a, 
                    const unsigned int b, 
                    const unsigned int d);

Note that we want to be sure that no unnecessary overflow or underflow happens. For example,

if a = 2^31, b = 2^31, d = 2^31,

the answer should be 2^31, despite the fact that a * b would overflow. (See more clarification on that below)

Now I wrote a simple function in c which works, then compiled it to machine code, then disassembled back to assembly code, finally removed some unnecessary instructions.

My final piece of assembly code is:

muldiv:
    imulq   %rsi, %rax
    xorl    %edx, %edx
    divq    %rcx
    ret

Which works when compiled to executable code and checked on several test cases. However, I do not properly understand, what's happening in this code.

Hence, could anyone explain me why this code works (or perhaps it does not?), in particular:

why divq %rcx instruction uses only one register? I assume this is the division part, so how does it know which two arguments to use?
how does it know that when I call muldiv from another place, the arguments a, b and d are stored in the registers %rsi / %rax / e.t.c, not somewhere else?
why xorl %edx, %edx is necessary? When removed, I get a runtime error.
How does it make multiplication on long long numbers using only one instruction, if machine can operate only on 32 bit numbers?

Clarification of overflow and underflow: This function should return the result as if we're operating on unsigned 64bit numbers. The code in c is as follows:

// NOTE: when compiled to assembly code, I removed A LOT of instructions,
// but it still works
unsigned int muldiv(const unsigned int a, 
    const unsigned int b, 
    const unsigned int d) {

    const unsigned long long la = a;
    const unsigned long long lb = b;
    const unsigned long long ld = d;

    const unsigned long long ab = la * lb;
    const unsigned long long ab_over_d = ab / ld;
    return (unsigned int) ab_over_d;
}

And it worked, when called in such a way:

#include "muldiv.h"

int main(void) { 
   unsigned int a = (1 << 31);
   unsigned int b = (1 << 31);
   unsigned int d = (1 << 31);

   unsigned int result = muldiv(a, b, d);
   printf("%u\n", result);  // prints (1 << 31), which is correct.

   return 0;
}

Have you read the intel programmers reference manual? It's a long book but really interesting ... and answers some of your questions. You'll also want to understand the ABI you are using. I'll find some links... (http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf) — JCx, Nov 12 '15 at 16:36

score 4 · Answer 1 · edited May 23 '17 at 12:15

It's a bit of a partial answer I'm afraid, I'll need to look at it again when I have a moment (work calls!), but hopefully helps :-

why divq %rcx instruction uses only one register? I assume this is the division part, so how does it know which two arguments to use?

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf Vol. 2A 3-257

Divides unsigned the value in the AX, DX:AX, EDX:EAX, or RDX:RAX registers (dividend) by the source operand (divisor) and stores the result in the AX (AH:AL), DX:AX, EDX:EAX, or RDX:RAX registers. The source operand can be a general-purpose register or a memory location. The action of this instruction depends on the operand size (dividend/divisor). Division using 64-bit operand is available only in 64-bit mode.

Non-integral results are truncated (chopped) towards 0. The remainder is always less than the divisor in magnitude. Overflow is indicated with the #DE (divide error) exception rather than with the CF flag

how does it know that when I call muldiv from another place, the arguments a, b and d are stored in the registers %rsi / %rax / e.t.c, not somewhere else?

This is compiler and environment specific. You need to look up the ABI for your environment: What is Application Binary Interface (ABI)?

why xorl %edx, %edx is necessary? When removed, I get a runtime error.

It zeros the DX register.

How does it make multiplication on long long numbers using only one instruction, if machine can operate only on 32 bit numbers?

The RAX register is a 64bit register. So why do you think the machine is operating on 32-bit numbers? Maybe your test cases need expanding?

score 4 · Accepted Answer · answered Nov 12 '15 at 16:41

why divq %rcx instruction uses only one register? I assume this is the division part, so how does it know which two arguments to use?

It uses implicit 128 bit dividend from register pair rdx:rax. See the instruction set reference for the description of the operation.

how does it know that when I call muldiv from another place, the arguments a, b and d are stored in the registers %rsi / %rax / e.t.c, not somewhere else?

That's defined by the calling convention. See wikipedia for a summary.

why xorl %edx, %edx is necessary? When removed, I get a runtime error.

See point #1, above. rdx has he top 64 bits of the dividend, so it should be cleared for unsigned division.

How does it make multiplication on long long numbers using only one instruction, if machine can operate only on 32 bit numbers?

You used 64 bit mode, so that's not true. Also, the mul and div instructions come in double width flavor, so you even get 128 bit versions in 64 bit mode, and 64 bit versions in 32 bit mode. Again, see the instruction set reference.

How linux assembler multiplies and divides 64 bit numbers?

2 Answers2