4

I'm trying to divide a 32-bit signed integer by a 16-bit signed integer to get a signed 32-bit quotient and 16-bit remainder.

I'm targeting a 286 with no fpu.

I've already written code in the past to do 32-bit unsigned division:

DIV32 PROC

;DIVIDES A 32-BIT VALUE BY A 16-BIT VALUE.

;ALTERS AX
;ALTERS BX
;ALTERS DX

;EXPECTS THE 32-BIT DIVIDEND IN DX:AX
;EXPECTS THE 16-BIT DIVISOR IN BX

;RETURNS THE 32-BIT QUOTIENT IN DX:AX
;RETURNS THE 16-BIT REMAINDER IN BX

    push di
    push si


    mov di, ax ;di -> copy of LSW of given dividend
    mov ax, dx ;ax -> MSW of given dividend
    xor dx, dx ;dx:ax -> 0:MSW  
    div bx     ;ax:dx -> ax=MSW of final quotient, dx=remainder

    mov si, ax ;si -> MSW of final quotient
    mov ax, di ;dx:ax -> dx=previous remainder, ax=LSW of given dividend
    div bx     ;ax:dx -> ax=LSW of final quotient, dx=final remainder  
    mov bx, dx ;bx -> final remainder
    mov dx, si ;dx:ax -> final quotient


    pop si
    pop di
    ret

DIV32 ENDP 

So far, i've tried to do the obvious thing and just modify my existing code by swapping out the XOR DX, DX with CWD and DIV with IDIV:

IDIV32 PROC

;DIVIDES A SIGNED 32-BIT VALUE BY A SIGNED 16-BIT VALUE.

;ALTERS AX
;ALTERS BX
;ALTERS DX

;EXPECTS THE SIGNED 32-BIT DIVIDEND IN DX:AX
;EXPECTS THE SIGNED 16-BIT DIVISOR IN BX

;RETURNS THE SIGNED 32-BIT QUOTIENT IN DX:AX
;RETURNS THE 16-BIT REMAINDER IN BX

    push di
    push si


    mov di, ax ;di -> copy of LSW of given dividend
    mov ax, dx ;ax -> MSW of given dividend
    cwd        ;dx:ax -> 0:MSW, or ffff:MSW  
    idiv bx    ;ax:dx -> ax=MSW of final quotient, dx=remainder

    mov si, ax ;si -> MSW of final quotient
    mov ax, di ;dx:ax -> dx=previous remainder, ax=LSW of given dividend
    idiv bx    ;ax:dx -> ax=LSW of final quotient, dx=final remainder  
    mov bx, dx ;bx -> final remainder
    mov dx, si ;dx:ax -> final quotient


    pop si
    pop di
    ret

IDIV32 ENDP 

This works for some calculations, like -654,328/2=-327164 (0xfff60408/2=fffb0204). But it doesn't work with certain inputs, -131,076/2 returns the incorrect result of -2 remainder 0. A divisor of 1, or -1 causes a division error seemingly regardless of dividend.

I've tested many different positive and negative dividends and divisors in an attempt to find some kind of pattern of correct and incorrect results, i've noticed that it can't correctly return a result of -65538.

I have a hunch that i'm supposed to use CWD conditionally depending on input, but it seems like XOR DX, DX returns incorrect results more often. Either work when both divisor and dividend are positive and the quotient is less than 0x7fffffff.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
bad
  • 939
  • 6
  • 18
  • 1
    [Assembler 8086 divide 32 bit number in 16 bit number](//stackoverflow.com/a/46506090) shows code for 32/16 signed division, but I don't know if it's correct (probably) or efficient (only 2 `div`, but lots of other work; I'd guess it's possible to do better.) I had a look at GCC's `libgcc.a` helper functions like `__divti3` (__int128/__int128 => 128-bit signed division on x86-64), and it actually branches, but that's with the divisor being twice the register width as well as the dividend. – Peter Cordes Mar 30 '19 at 09:01
  • Maybe I could do a regular unsigned division, then decide the quotient's sign after the division. If the signs are different, the result should be negative etc. Then I only have to worry about negating a 32-bit integer. Maybe I'm not considering something important and it's not that easy, but I'll test it when I can. I don't really need a remainder. – bad Mar 30 '19 at 09:48
  • 2
    Yes, that's an option (and absolute-value is probably what the first version I linked was doing with `sbb`). See [Displaying numbers with DOS](//stackoverflow.com/a/45904076) for a 32-bit division by `10` using 2x `div` which has comments and more clearly is just doing unsigned division surrounded by sign-handling, but simplified because the divisor is known positive. (In the the general case, `xor a, b` gives you the sign bit of `a/b`, so you could use `xor dividend_high,divisor / jns` to skip over a `neg quotient` at the end, assuming no overflow...) – Peter Cordes Mar 30 '19 at 09:55
  • 1
    I think the problem with your current version is that the low half of your quotient is signed when it needs to be unsigned. (e.g. write out long division of `-32 / 2` like in one of the answers in [the link above](//stackoverflow.com/a/46506090), and work it through: you get decimal digits of `-1 -6` when you need `-1 6`). (Only the high word of a multi-word integer is signed). But you need to produce the unsigned low half from a signed remainder:low_half and a signed divisor. Maybe we can fixup that low-half result; e.g. your `-131,076/2 = -2` should be `0xfffefffe = -2:-2` not `-1:-2` – Peter Cordes Mar 30 '19 at 10:51
  • Oh, so the second `IDIV` should be a `DIV`. Ok, i'll test it when I wake up tomorrow. And i'll add an answer if I get it working. – bad Mar 30 '19 at 10:58
  • 1
    No, it can't be `div`. The *result* needs to be unsigned, but the inputs are signed 32-bit (remainder:low_half) and signed 16-bit (divisor). With a signed-negative dividend, `div` would see it as a very large unsigned number and #DE with a small positive divisor. And as we can see from the `-2:-2` correct output vs. the `-1:-2` actual output, we need some kind of correction to propagate up to the high word, because the low bits already are (or at least happen to be) correct in this case. Martin's answer has a more detailed analysis. – Peter Cordes Mar 30 '19 at 11:39

4 Answers4

3

I don't know any algorithm for splitting a large negative number into parts and preparing it for IDIV. I would calculate the absolute value of dividend and divisor, use the function DIV32 and lastly process the result according to the stored sign:

IDIV32 PROC      ; DX:AX / BX = DX/AX rem BX
    ; 99 / 5   =  19 rem 4
    ; 99 / -5  = -19 rem 4
    ; -99 / 5  = -19 rem -4
    ; -99 / -5 =  19 rem -4

    mov ch, dh          ; Only the sign bit counts!
    shr ch, 7           ; CH=1 means negative dividend
    mov cl, bh          ; Only the sign bit counts!
    shr cl, 7           ; CL=1 means negative divisor

    cmp ch, 1           ; DX:AX negative?
    jne J1              ; No: Skip the next two lines
    not dx              ; Yes: Negate DX:AX
    neg ax              ; CY=0 -> AX was NULL
    cmc
    adc dx, 0           ; Adjust DX, if AX was NULL
    J1:

    cmp cl, 1           ; BX negative?
    jne J2              ; No: Skip the next line
    neg bx              ; Yes: Negate BX
    J2:

    push cx             ; Preserve CX
    call DIV32
    pop cx              ; Restore CX

    cmp ch, cl          ; Had dividend and divisor the same sign?
    je J3               ; Yes: Skip the next two lines
    not dx
    neg ax              ; CY=0 -> AX was NULL
    cmc
    adc dx, 0           ; Adjust DX, if AX was NULL
    J3:

    cmp ch, 1           ; Was divisor negative?
    jnz J4              ; No: Skip the next line
    neg bx              ; Negate remainder
    J4:

    ret
IDIV32 ENDP
rkhb
  • 14,159
  • 7
  • 32
  • 60
  • 1
    Shouldn't the first `or ch, dh` be just `mov ch, dh`? Or did you find that `or` was more efficient on some CPUs? I don't think it would avoid partial-register renaming on Sandybridge; IIRC the not-for-RMW behaviour is for low registers like CL, not CH. Anyway, the `test ch,1` instructions can be just `test ch,ch`, you don't need to isolate bit 1. – Peter Cordes Mar 30 '19 at 18:42
  • 1
    You can also avoid the shifting to booleanize (`mov cx,dx / mov cl,bh`), and use `test ch,ch`/`jns` to jump if the sign bit was set. Or `test ch, 0x80`/`jz` if you want. After returning if you `neg` the divisor first (did you mean remainder), then you can `xor cl, ch` / `js` for the quotient (not divisor). It's ok to destroy CL at that point. (And yes, OR into CL, not CH, because fewer CPUs do unwanted partial-register renaming for low8 regs.) – Peter Cordes Mar 30 '19 at 18:43
  • 1
    @PeterCordes: Yes, I was too bit-oriented thinking only of logical and bit shifting operations. But all these operations show quite well that only the sign bit is important. – rkhb Mar 30 '19 at 18:53
  • 1
    I think `test`/`JS` makes it even clearer you're just checking the sign bit. Your code is already well commented, and would benefit from having fewer instructions to read through. (Not to mention execute). – Peter Cordes Mar 30 '19 at 18:55
  • 1
    To negate DX:AX, you can use `neg dx` / `neg ax` / `sbb dx,0`. I tested that for a couple cases with `AH:AL` so I could watch AX in GDB. Even better would be if we had a zeroed registers somewhere else so we could simply `neg ax` / `sbb cx, dx` or something to just use a straightforward subtract-from-zero. It's too bad the `not`/`neg` trick only works with the low half being non-zero, though, that was a neat idea. – Peter Cordes Apr 01 '19 at 02:46
  • I posted an answer with an optimized version of your function with these suggestions + inlining `DIV32` to save on `mov` instructions. – Peter Cordes Apr 01 '19 at 12:53
2

Your algorithm cannot be changed to signed that simply.

Let's take the calculation (+1)/(-1) as an example:

(+1)/(-1) = (-1), remainder 0

In the first step of your algorithm you are dividing the high bits by the divisor:

The high bits of (+1) are 0, so you are calculating:

0/(-1) = 0, remainder 0

The correct high bits of the entire 32-bit division are however 0FFFFh, not 0. And the reminder you require for the second division would also be 0FFFFh and not 0.

Oh, so the second IDIV should be a DIV. Ok, i'll test it when I wake up tomorrow. And i'll add an answer if I get it working.

The first division already does not produce the result you want. So changing the second division won't help much...

A divisor of 1, or -1 causes a division error seemingly regardless of dividend.

I would expect this only if bit 15 of the dividend is set and:

  • ... you divide by 1 or
  • ... you divide by -1 and at least one of the low 15 bits of the dividend is set

In these cases you are dividing:

  • ... a number in the range 000008000h...00000FFFFh by 1
    The result would be in the range +08000h...+0FFFFh
  • ... a number in the range 000008001h...00000FFFFh by -1
    The result would be in the range -0FFFFh...-08001h

... however, the result is a signed 16-bit value and therefore must be in the range -8000h...+7FFFh.

I just tried 12345678h/(+1) and 12345678h/(-1) in a virtual machine running DOS:

Bit 15 of 12345678h is not set; both times I don't get a division error. (But a wrong result when dividing by -1!)

Martin Rosenau
  • 17,897
  • 3
  • 19
  • 38
1

Using 2x idiv has a fundamental problem: we need the 2nd division to produce the low half of the quotient, which is unsigned and can be anything from 0 to 0xffff.

Only the highest word of a multi-word integer contains the sign bit, all bits below that have positive place-value. idiv's quotient range is -2^15 .. 2^15-1, not 0 .. 65535. Yes, idiv can produce all required values, but not from inputs we can get from simple fixups of the first division results. For example, 0:ffff / 1 will result in a #DE exception with idiv because the quotient doesn't fit in a signed 16-bit integer.

So the second division instruction has to be div, using the absolute value of the divisor, and an appropriate high half. (div requires both its inputs to be unsigned, so a signed remainder from a first idiv would be a problem, too.)

It might still be possible to use idiv for the first division, but only with fixups for its results before div, on top of still having to take the absolute value of the divisor and first-remainder to feed unsigned div. It's an interesting possibility, but in practice it's going to be cheaper to save and reapply signs around an unsigned division.

As @Martin points out, the first division of +1 / -1 with naive idiv gives the wrong high half quotient (0 / -1 = 0 not -1), and the wrong input for the 2nd division (0 % -1 = 0, not -1). TODO: figure out what fixups would actually be needed. Maybe just a conditional +-1, but we know the magnitude of the remainder can't get larger than the divisor, because high_half < divisor is necessary and sufficient for div to not fault.

Your -131,076/2 = -2 is (perhaps by coincidence) only off by 1 in one half of its result:
it should be 0xfffefffe = -2:-2 not -1:-2.


Optimized version of @rkhb's function, with DIV32 inlined.

We record input signs and then do unsigned division on the absolute values, and restore sign later. (If remainder sign isn't needed, we can simplify; the quotient sign only depends on xor dividend,divisor)

Or if the dividend is small enough, we can use one idiv. We have to avoid the -2^15 / -1 overflow case, though, so the fast check for DX being the sign extension of AX not only misses some safe cases (with larger divisors), but tries that unsafe case. If small numbers are common (like is the case in most computer programs), this fast test based on cwd is still maybe a good idea, with another test after the absolute-value calculation.

Branches are cheap-ish on 286, so I mostly kept the branching instead of using branchless abs(). (e.g. for a single register: with cdq (or sar reg,15) / xor / sub, like compilers make, based on the 2's complement identity that -x = ~x + 1). And mov/neg/cmovl of course isn't available until P6-family. If you need compat with 286 but mostly care about performance on modern CPUs, you might make different choices. But it turns out that a 32-bit branchless ABS is smaller code size than branching. However, it's probably slower than branchy for positive inputs, where some instructions would have been skipped. Assembler 8086 divide 32 bit number in 16 bit number has an interesting idea of creating 0/-1 integers for both dividend and divisor, and then for later re-applying the signs you can XOR those together and use the same XOR/SUB 2's complement bithack to sign-flip the results or not.

Style: local labels (within the function) prefixed with @@. I think this is normal for TASM, like NASM .label local labels.

   ; signed 32/16 => 32-bit division, using 32/16 => 16-bit division as the building block
   ; clobbers: CX, SI
IDIV32 PROC      ; DX:AX / BX = DX/AX rem BX
;global IDIV32_16       ; for testing with NASM under Linux
;IDIV32_16:
    ; 99 / 5   =  19 rem 4
    ; 99 / -5  = -19 rem 4
    ; -99 / 5  = -19 rem -4
    ; -99 / -5 =  19 rem -4

    mov   cx, dx          ; save high half before destroying it

;;; Check for simple case
    cwd                   ; sign-extend AX into DX:AX
    cmp   cx, dx          ; was it already correctly sign-extended?
    jne   @@dividend_32bit

    ; BUG: bx=-1  AX=0x8000 overflows with #DE
    ; also, this check rejects larger dividends with larger divisors
    idiv  bx
    mov   bx, dx
    cwd
    ret

;;; Full slow case: divide CX:AX by BX
   @@dividend_32bit:
    mov   si, ax                ; save low half
    mov   ax, cx                ; high half to AX for first div

                                ; CH holds dividend sign
    mov   cl, bh                ; CL holds divisor sign

 ;;; absolute value of inputs
    ; dividend in  AX:SI
    cwd                         ; 0 or -1
    xor   si, dx                ; flip all the bits (or not)
    xor   ax, dx
    sub   si, dx                ; 2's complement identity: -x = ~x - (-1)
    sbb   ax, dx                ; AX:SI = abs(dividend)

    test  bx, bx          ; abs(divisor)
    jnl  @@abs_divisor
    neg   bx
   @@abs_divisor:

 ;;; Unsigned division of absolute values
    xor   dx, dx
    div   bx             ; high half / divisor
    ; dx = remainder = high half for next division
    xchg  ax, si
    div   bx
 ;;; absolute result: rem=DX  quot=SI:AX
    mov   bx, dx
    mov   dx, si


 ;;; Then apply signs to the unsigned results
    test  cx,cx          ; remainder gets sign of dividend
    jns  @@remainder_nonnegative
    neg   bx
  @@remainder_nonnegative:

    xor   cl, ch         ; quotient is negative if opposite signs
    jns  @@quotient_nonnegative
    neg   dx
    neg   ax             ; subtract DX:AX from 0
    sbb   dx, 0          ; with carry
  @@quotient_nonnegative:

    ret
IDIV32 ENDP

Optimizations:

  • simpler sign-saving and sign-testing, using x86's built-in Sign Flag, set from the MSB of a result, and the js jump if SF==1. Avoids shifting the sign bit down to the bottom of an 8-bit register. Testing for same-signs can be done with xor/jns, because same signs will "cancel" and leave SF=0 whether it was both-0 or both-1. (In general, XOR can be used for comparison for equality, but it's usually only useful that way for bitwise cases where you care about one bit but not others).

  • Avoid writing CH by itself, for the benefit of modern Intel CPUs that do partial-register renaming for that case. This function never gets CH renamed separately from the rest of ECX. (On older CPUs like 286, there's no downside to mov cx,dx vs. mov ch,dh). We also avoid reading high-8 partial registers (e.g. test cx,cx instead of test ch,ch) because that has higher latency on recent Intel Sandybridge-family CPUs. (How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent). On P6-family, writing low-8 partial registers will rename them separately from the full register, so there it's probably best to just read 8-bit partial regs after writing any.

    Of course, on modern CPUs 16-bit registers like cx are partial registers, even in 16-bit mode (because 32-bit registers are available there), so even mov cx,dx has a false dependency on the old value of ECX.


On 386+

Obviously on a 386+ where 32-bit registers / operand-size are available, you would use that even in 16-bit mode:

;; i386 version
    ;; inputs: DX:AX / BX
    shl   edx, 16
    mov   dx, ax         ; pack DX:AX into EDX
    mov   eax, edx

    movsx ebx, bx        ; sign-extend the inputs to 32 bit EBX
    cdq                  ; and 64-bit EDX:EAX
    idiv  ebx
    ; results: quotient in EAX, remainder in EDX

    mov   ebx, edx       ; remainder -> bx
    mov   edx, eax
    sar   edx, 16        ; extract high half of quotient to DX
    ;; result: quotient= DX:AX, remainder = BX

This can #DE from BX=0, or on overflow from DX:AX=-2^31 and BX=-1 (LONG_MIN/-1)


Test harness:

NASM wrapper to call from 32-bit mode

%if __BITS__ = 32
global IDIV32
IDIV32:
    push   esi
    push   ebx
    push   edi      ; not actually clobbered in this version
    movzx  eax, word [esp+4  + 12]
    movzx  edx, word [esp+6  + 12]
    movzx  ebx, word [esp+8  + 12]
    call   IDIV32_16

    shl    edx, 16
    mov    dx, ax
    mov    eax, edx

    movsx  edx, bx       ; pack outputs into EDX:EAX "int64_t"

    pop    edi
    pop    ebx
    pop    esi
    ret
%endif

C program, compile as 32-bit and link with the asm:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <limits.h>

// returns quotient in the low half, remainder in the high half (sign extended)
int64_t IDIV32(int32_t dxax, int16_t bx);

static int test(int a, short b) {
//  printf("\ntest %d / %d\n", a, b);
    int64_t result = IDIV32(a,b);

    int testrem = result>>32;
    int testquot = result;

    if (b==0 || (a==INT_MIN && b==-1)) {
        printf("successfully called with inputs which overflow in C\n"
               "%d/%d gave us %d rem %d\n",
               a,b, testquot, testrem);
        return 1;
    }
    int goodquot = a/b, goodrem = a%b;

    if (goodquot != testquot || goodrem != testrem) {
        printf("%d/%d = %d rem %d\t but we got %d rem %d\n",
               a,b, goodquot, goodrem, testquot, testrem);
        printf("%08x/%04hx = %08x rem %04hx\t but we got %08x rem %04hx\n"
               "%s quotient, %s remainder\n",
               a,b, goodquot, goodrem, testquot, testrem,
               goodquot == testquot ? "good" : "bad",
               goodrem == testrem ? "good" : "bad");
        return 0;
    }
    return 1;
}

int main(int argc, char*argv[])
{
    int a=1234, b=1;
    if(argc>=2) a = strtoll(argv[1], NULL, 0);  // 0x80000000 becomes INT_MIN instead of saturating to INT_MAX in 32-bit conversion
    if(argc>=3) b = strtoll(argv[2], NULL, 0);
    test(a, b);
    test(a, -b);
    test(-a, b);
    test(-a, -b);

    if(argc>=4) {
        int step=strtoll(argv[3], NULL, 0);
        while ( (a+=step) >= 0x7ffe) {  // don't loop through the single-idiv fast path
            // printf("testing %d / %d\n", a,b);
            test(a, b);
            test(-a, -b);
            test(a, -b);
            test(-a, b);
        }
        return 0;
    }
}

(This is sloppy between int and int32_t because I only care about it running on x86 Linux where that's the same type.)

Compile with

 nasm -felf32 div32-16.asm &&
 gcc -g -m32 -Wall -O3 -march=native -fno-pie -no-pie div32-test.c div32-16.o

Run with ./a.out 131076 -2 -1 to test all dividends from that to 0x7ffe (step=-1) with divisor=-2. (For all combinations of -a / -b, a / -b etc.)

I didn't do nested loops for quotient and divisor; you could do that with the shell. I also didn't do anything clever for testing some dividends around the max and some near the bottom of the range.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

I re-wrote my idiv32 procedure so that it negates a negative dividend or divisor into a positive/unsigned form, performs the unsigned division, then negates the quotient if dividend XOR divisor is true.

EDIT: Use js and jns instead of testing against a bitmask of 80h. Don't bother returning remainder. Remainder should share the sign of the dividend, but since I don't really need the remainder, I won't bother making the procedure even more complex to correctly handle it.

idiv32 proc

;Divides a signed 32-bit value by a signed 16-bit value.

;alters ax
;alters bx
;alters dx

;expects the signed 32-bit dividend in dx:ax
;expects the signed 16-bit divisor in bx

;returns the signed 32-bit quotient in dx:ax

push cx
push di
push si

    mov ch, dh      ;ch <- sign of dividend
    xor ch, bh      ;ch <- resulting sign of dividend/divisor

    test dh, dh     ;Is sign bit of dividend set?  
    jns chk_divisor ;If not, check the divisors sign.
    xor di, di      ;If so, negate dividend.  
    xor si, si      ;clear di and si   
    sub di, ax      ;subtract low word from 0, cf set if underflow occurs
    sbb si, dx      ;subtract hi word + cf from 0, di:si <- negated dividend
    mov ax, di      
    mov dx, si      ;copy the now negated dividend into dx:ax

chk_divisor:
    xor di, di
    sbb di, bx      ;di <- negated divisor by default
    test bh, bh     ;Is sign bit of divisor set?
    jns uint_div    ;If not, bx is already unsigned. Begin unsigned division.
    mov bx, di      ;If so, copy negated divisor into bx.

uint_div:
    mov di, ax      ;di <- copy of LSW of given dividend
    mov ax, dx      ;ax <- MSW of given dividend
    xor dx, dx      ;dx:ax <- 0:MSW  
    div bx          ;ax:dx <- ax=MSW of final quotient, dx=remainder

    mov si, ax      ;si <- MSW of final quotient
    mov ax, di      ;dx:ax <- dx=previous remainder, ax=LSW of given dividend
    div bx          ;ax:dx <- ax=LSW of final quotient, dx=final remainder
    mov dx, si      ;dx:ax <- final quotient

    test ch, ch     ;Should the quotient be negative?
    js neg_quot     ;If so, negate the quotient.
pop si              ;If not, return. 
pop di
pop cx
    ret

neg_quot:
    xor di, di      
    xor si, si
    sub di, ax
    sbb si, dx
    mov ax, di
    mov dx, si      ;quotient negated
pop si
pop di
pop cx
    ret

idiv32 endp
bad
  • 939
  • 6
  • 18
  • 1
    This has a bug if CL has any other bits set on function entry: you'll get false-positives. Use `mov cl, 80h` instead of `or cl, 80h`. Or better, use `test` with an immediate operand, or even better `test bh,bh` / `jns` to test the sign bit. SF exists so that you don't need bitmasks like `80h`. – Peter Cordes Apr 03 '19 at 05:56
  • It was originally `mov cl, 0`, I changed it after I tested the procedure. I've never considered using logical instructions to set and clear SF. – bad Apr 03 '19 at 06:21
  • 1
    Well you want `mov`, not `or`, but the bug was that `0` isn't right, because `x & 0 =0` for any x so it tells you nothing about the number when used with `test`! I suggest reading my answer, where I explained using SF and showed a tested + working implementation of `idiv32` that's a bit more efficient than yours :P (e.g. look at just the code after the check to see if a single `idiv` will work). Also, yours forgets to apply the sign to the remainder, so the code is shorter. Mine has a test harness that tests all 4 combinations of `a,b` ; `-a,b` ; `a,-b` ; `-a,-b` vs. C division. – Peter Cordes Apr 03 '19 at 06:28
  • Oops, I meant to say it was originally `mov cl, 80h`.Anyway, I like my version better, because I wrote it myself. I don't like to copy paste other people's code, I just look for hints and tricks I can use to make my code better. – bad Apr 03 '19 at 06:34
  • 1
    That's fair, congratulations on writing a mostly-correct implementation. `sub/sbb` from zeroed registers is maybe easier to understand than in-place `neg` / `sbb`, but takes more instructions (especially to `mov` back into the original register pair). You still have 2 real bugs: `or cl, 80h` will treat numbers as "negative" when actually they only have lower bits set. e.g. test with a `mov cx, 0xffff` in the caller and see that you get the wrong sign of the result, or that you make the input to `div` negative if it was positive. And if you check the sign of the remainder, it can be wrong. – Peter Cordes Apr 03 '19 at 06:38
  • Are you saying that `idiv` produces a signed remainder, and i'm not signing my remainder? EDIT: Yeah, I just checked the documentation, the remainder is supposed to have the same sign as the dividend. Ok, i'll fix that. – bad Apr 03 '19 at 06:41
  • 1
    Yeah, rkhb's function starts with a comment that demonstrates all 4 cases, showing that the remainder takes the sign of the the dividend. I copied that in my answer, and talked about it in the opening text about why 2x `idiv` is not viable and how you might fixup the result from a first `idiv` to feed a `div`. – Peter Cordes Apr 03 '19 at 07:16