move data from double word array to byte array

Question

Hello I am trying to divide the given array that is double words to array that must be bytes,

a dd 12345678h,1A2B3Ch,78h, ;given array

and I want to add only the number that is not equal to 0 as you see the first number is ok the second has tow zeros at the end 001A2B3Ch, and the third one has six zeros 00000078h

I wrote a code to do this, for first number it works it add to array characters with ASCII codes of 78,56,34,12,28,2B and it is not correct for the last two numbers it must look like (78,56,34,12,3C,2B,1A,78) I don't know why?

assume cs:code, ds:data
data segment
a dd 12345678h,1A2B3Ch,78h ;given array
l equ $-a
l1 equ l/4
zece db 10
pat dw 4
n db l dup(?) ;distination array
data ends

code segment
start:
    mov ax,data
    mov ds,ax

    mov cl,l1
    mov ch,0
    mov si,0
    mov ax,0
    mov bx,0

    repeta:
        mov bx,si
        mul pat
        mov al,byte ptr a[si]
        mov n[bx],al
        mov al,byte ptr a[si]+1
        add bx,1
        mov n[bx],al
        mov al,byte ptr a[si]+2
        add bx,1
        mov n[bx],al
        mov al,byte ptr a[si]+3
        add bx,1
        mov n[bx],al
        inc si
    loop repeta

mov ax,4C00h
int 21h
code ends
end start

So from `1A2B3Ch` you want to copy only 3 bytes into destination array, right? — Ped7g, Feb 20 '17 at 18:57

Ped7g · Accepted Answer · 2017-02-26T21:40:31.407

First thing, always understand your data, the x86 memory is addressable by bytes. It doesn't matter what kind of logical structure you use to write the data into the memory, if anybody else is watching the memory content, and they don't know about your logical structure, they see only bytes.

a dd 12345678h,1A2B3Ch,78h

So this compiles as 12 (3 * 4) bytes:

78 67 34 12 3C 2B 1A 00 78 00 00 00

To condense such array by removing zeroes you don't even need to work with double words, just copy it byte by byte (voluntarily dropping away your knowledge that it was meant as double word array originally), skipping zero values.

code segment
start:
    mov ax,data
    mov ds,ax

    lea     si,[a]      ; original array offset
    lea     di,[n]      ; destination array offset
    mov     cx,l        ; byte (!) length of original array

    repeta:
        ; load single byte from original array
        mov     al,[si]
        inc     si
        ; skip zeroes
        test    al,al
        jz      skipping_zero
        ; store non-zero to destination
        mov     [di],al
        inc     di
    skipping_zero:
        loop    repeta

    ; fill remaining bytes of destination with zeroes - init
    xor     al,al
    lea     si,[n+l]    ; end() offset of "n"
    ; jump first to test, so filling is skipped when no zero
    jmp     fill_remaining_test

    fill_remaining_loop:
        ; clear one more byte in destination
        mov     [di],al
        inc     di
    fill_remaining_test:
        ; test if some more bytes are to be cleared
        cmp     di,si       ; current offset < end() offset
        jb      fill_remaining_loop

    ; exit back to DOS
    mov ax,4C00h
    int 21h

code ends
end start

But this is complete rewrite of your code, unfortunately, so I will try to add some explanations what's wrong in yours.

About MUL, and especially about multiplying by power of two value:

    mov     bx,si   ; bx = si (index into array?)
    mul     pat     ; dx:ax = ax * word(4)

As you can see, the mul doesn't use either bx, or si, and it results into 32 bit value, split into dx (upper word) and ax (lower word).

To multiply si by 4 you would have either to do:

    mov     ax,si   ; ax = si
    mul     [pat]   ; dx:ax = ax * word(4)

Or simply exploiting that computers are working with bits, and binary encoding of integer values, so to multiply by 4 you need only to shift bit values in the value by two positions "up" (left).

    shl     si,2    ; si *= 4 (truncated to 16 bit value)

But that destroys original si ("index"), so instead of doing this people usually adjust the loop increment. You will start with si = 0, but instead of inc si you would do add si,4. No multiply needed any more.

add bx,1 hurts my eyes, I prefer inc bx in human Assembly (although on some generations of x86 CPUs the add bx,1 was faster, but on modern x86 the inc is again fine).

mov al,byte ptr a[si]+1 is very weird syntax, I prefer to keep things "Intel-like" simple, ie. mov al,byte ptr [si + a + 1]. It's not C array, it's really loading value from memory from address inside the brackets. Mimicking C-array syntax will probably just confuse you over time. Also the byte ptr can be removed from that, as al defines the data width already (unless you are using some MASM which enforces this upon dd array, but I don't want to touch that microsoft stuff with ten foot pole).

Same goes for mov n[bx],al = mov [n + bx],al or mov [bx + n],al, whichever makes more sense in the code.

But overall it's a bit unusual to use index inside loop, usually you want to convert all indexes into addresses ahead of loop in the init part, and use final pointers without any calculation inside loop (incrementing them by element size, ie. add si,4 for double words). Then you don't need to do any index multiplication.

Especially in 16 bit mode, where the addressing modes are very limited, in 32/64b mode you can at least multiply one register with common sizes (1, 2, 4, 8), ie. mov [n + ebx * 4],eax = no need to multiply it separately.

EDIT: there's no scale (multiply by 1/2/4/8 of "index" part) in 16b mode available, the possible example [si*4] would not work.

New variant storing bytes from most-significant dword byte (ie. reversing the little-endian scheme of x86 dword):

code segment
start:
    mov     ax,data
    mov     ds,ax
    lea     si,[a]      ; original array offset
    lea     di,[n]      ; destination array offset
    mov     cx,l1       ; element-length of original array

    repeta:
        ; load four bytes in MSB-first order from original array
        ; and store only non-zero bytes to destination
        mov     al,[si+3]
        call    storeNonZeroAL
        mov     al,[si+2]
        call    storeNonZeroAL
        mov     al,[si+1]
        call    storeNonZeroAL
        mov     al,[si]
        call    storeNonZeroAL
        ; advance source pointer to next dword in array
        add     si,4
        loop    repeta

    ; Different ending variant, does NOT zero remaining bytes
    ; but calculates how many bytes in "n" are set => into CX:
    lea       cx,[n]      ; destination begin() offset
    sub       cx,di
    neg       cx          ; cx = number of written bytes into "n"

    ; exit back to DOS
    mov ax,4C00h
    int 21h

; helper function to store non-zero AL into [di] array
storeNonZeroAL:
    test    al,al
    jz      ignoreZeroAL
    mov     [di],al
    inc     di
ignoreZeroAL:
    ret

code ends
end start

Written in a way to keep it short and simple, not for performance (and I strongly suggest you to aim for the same, until you feel really comfortable with the language, it's difficult enough for beginner even if written in simple way without any expert-trickery).

BTW, you should find some debugger which works for you, so it would be possible for you to step instruction by instruction and watch how that resulting values in "n" are being added, and why. Or you would probably notice sooner that the bx+si vs mul don't do what you expect and the remaining code is operating on wrong indices. Programming in Assembly without debugger is like trying to assemble a robot blindfolded.

Thank you very much for all your advice, i am new with this programing language and we work just with 16b, may i ask you how to reverse the result for every number like it must be (12,34,56,78,1A,2B,3C,78) — Ehsan Mohebbi, Feb 20 '17 at 20:19
I'm not sure I understand you. You mean the order from most significant byte? Then you will have to work with structure of 4 bytes, not just to copy non-zero values, but also reverse the order of each 4 bytes. As with everything in Assembly, it has many possible ways how to code it. — Ped7g, Feb 20 '17 at 20:21
@Esan: ad "many possible ways": if you understand your data, and what calculation you want to achieve, you simply break that calculation down into simpler steps, until the steps are so simple, that they resemble CPU instructions. Then you write the code doing that. Assembly is not about learning which instruction does "reverse string"/etc... You just learn what the instruction does with registers and memory. And then you have to learn how various data can be stored. And figure out formula which calculates your desired result. Then you just write that calculation in CPU instructions = done. — Ped7g, Feb 21 '17 at 01:05
@Esan: also if you are studying assembly, I recently did answer somebody with healthy dose of curiosity, I think the solution is simple enough for you to understand it (with debugger and instruction reference guide to verify how they work), but it may be interesting for you to see how the question was build (OP didn't stop by finding first solution, but questioned and reasoned about it more), and how the answer was found (math theory and calculation again... it's not accident the "computer" contains "compute" word): http://stackoverflow.com/q/42245263/4271923 — Ped7g, Feb 21 '17 at 01:20
"Actually I think `[si*4]` is one of the few legal addressing modes in 16b mode?" Sorry to say but the 8086 only allows combinations of base registers `BX`/`BP` and index registers `SI`/`DI`. No forms of scaling exist. — Sep Roland, Feb 26 '17 at 21:24

move data from double word array to byte array

1 Answers1