First thing, always understand your data, the x86 memory is addressable by bytes. It doesn't matter what kind of logical structure you use to write the data into the memory, if anybody else is watching the memory content, and they don't know about your logical structure, they see only bytes.
a dd 12345678h,1A2B3Ch,78h
So this compiles as 12 (3 * 4) bytes:
78 67 34 12 3C 2B 1A 00 78 00 00 00
To condense such array by removing zeroes you don't even need to work with double words, just copy it byte by byte (voluntarily dropping away your knowledge that it was meant as double word array originally), skipping zero values.
code segment
start:
mov ax,data
mov ds,ax
lea si,[a] ; original array offset
lea di,[n] ; destination array offset
mov cx,l ; byte (!) length of original array
repeta:
; load single byte from original array
mov al,[si]
inc si
; skip zeroes
test al,al
jz skipping_zero
; store non-zero to destination
mov [di],al
inc di
skipping_zero:
loop repeta
; fill remaining bytes of destination with zeroes - init
xor al,al
lea si,[n+l] ; end() offset of "n"
; jump first to test, so filling is skipped when no zero
jmp fill_remaining_test
fill_remaining_loop:
; clear one more byte in destination
mov [di],al
inc di
fill_remaining_test:
; test if some more bytes are to be cleared
cmp di,si ; current offset < end() offset
jb fill_remaining_loop
; exit back to DOS
mov ax,4C00h
int 21h
code ends
end start
But this is complete rewrite of your code, unfortunately, so I will try to add some explanations what's wrong in yours.
About MUL, and especially about multiplying by power of two value:
mov bx,si ; bx = si (index into array?)
mul pat ; dx:ax = ax * word(4)
As you can see, the mul doesn't use either bx, or si, and it results into 32 bit value, split into dx (upper word) and ax (lower word).
To multiply si by 4 you would have either to do:
mov ax,si ; ax = si
mul [pat] ; dx:ax = ax * word(4)
Or simply exploiting that computers are working with bits, and binary encoding of integer values, so to multiply by 4 you need only to shift bit values in the value by two positions "up" (left).
shl si,2 ; si *= 4 (truncated to 16 bit value)
But that destroys original si ("index"), so instead of doing this people usually adjust the loop increment. You will start with si = 0, but instead of inc si you would do add si,4. No multiply needed any more.
add bx,1 hurts my eyes, I prefer inc bx in human Assembly (although on some generations of x86 CPUs the add bx,1 was faster, but on modern x86 the inc is again fine).
mov al,byte ptr a[si]+1 is very weird syntax, I prefer to keep things "Intel-like" simple, ie. mov al,byte ptr [si + a + 1]. It's not C array, it's really loading value from memory from address inside the brackets. Mimicking C-array syntax will probably just confuse you over time. Also the byte ptr can be removed from that, as al defines the data width already (unless you are using some MASM which enforces this upon dd array, but I don't want to touch that microsoft stuff with ten foot pole).
Same goes for mov n[bx],al = mov [n + bx],al or mov [bx + n],al, whichever makes more sense in the code.
But overall it's a bit unusual to use index inside loop, usually you want to convert all indexes into addresses ahead of loop in the init part, and use final pointers without any calculation inside loop (incrementing them by element size, ie. add si,4 for double words). Then you don't need to do any index multiplication.
Especially in 16 bit mode, where the addressing modes are very limited, in 32/64b mode you can at least multiply one register with common sizes (1, 2, 4, 8), ie. mov [n + ebx * 4],eax = no need to multiply it separately.
EDIT: there's no scale (multiply by 1/2/4/8 of "index" part) in 16b mode available, the possible example [si*4] would not work.
New variant storing bytes from most-significant dword byte (ie. reversing the little-endian scheme of x86 dword):
code segment
start:
mov ax,data
mov ds,ax
lea si,[a] ; original array offset
lea di,[n] ; destination array offset
mov cx,l1 ; element-length of original array
repeta:
; load four bytes in MSB-first order from original array
; and store only non-zero bytes to destination
mov al,[si+3]
call storeNonZeroAL
mov al,[si+2]
call storeNonZeroAL
mov al,[si+1]
call storeNonZeroAL
mov al,[si]
call storeNonZeroAL
; advance source pointer to next dword in array
add si,4
loop repeta
; Different ending variant, does NOT zero remaining bytes
; but calculates how many bytes in "n" are set => into CX:
lea cx,[n] ; destination begin() offset
sub cx,di
neg cx ; cx = number of written bytes into "n"
; exit back to DOS
mov ax,4C00h
int 21h
; helper function to store non-zero AL into [di] array
storeNonZeroAL:
test al,al
jz ignoreZeroAL
mov [di],al
inc di
ignoreZeroAL:
ret
code ends
end start
Written in a way to keep it short and simple, not for performance (and I strongly suggest you to aim for the same, until you feel really comfortable with the language, it's difficult enough for beginner even if written in simple way without any expert-trickery).
BTW, you should find some debugger which works for you, so it would be possible for you to step instruction by instruction and watch how that resulting values in "n" are being added, and why. Or you would probably notice sooner that the bx+si vs mul don't do what you expect and the remaining code is operating on wrong indices. Programming in Assembly without debugger is like trying to assemble a robot blindfolded.