If you just want to branch, use signed or signed-compare conditions
test reg,reg / jns non_negative (not sign-bit-set) or jnl non_negative (not less-than) are equivalent after a compare with zero.
That uses the FLAGS and conditions for their normal semantic meaning, i.e. doing a normal signed compare.
(test same,same is equivalent to cmp against zero, always clearing OF and CF, and is a well-known optimization for cmp reg, 0)
What you're doing doesn't set CF in a way that reflects the sign-bit, so a jc (jump if CF set) isn't useful.  You're counting non-zero numbers, ones where 0U < (unsigned)x is true.
Getting the carry flag set according to the MSB
It's only interesting to get your condition into CF if you're going to take advantage of that
by using adc dx, 0 or sbb dx, -1 to conditionally increment DX (when CF is 1 or 0, respectively.)
The sbb version is like dx -= -1 + CF so CF either cancels out the -1, or you subtract -1, i.e. add 1.
One way to get CF set according to the sign bit of a byte is simply to shift it out, e.g. shl bl, 1, if you don't mind destroying the value in BL.  Equivalently, add bl,bl is also a 2-byte instruction but can run on more execution units on modern CPUs.  (They both set FLAGS the same way, including CF).
It's not possible with a compare against zero.  0 - x always has a borrow (CF=1) for any non-zero x, and x - 0 never has carry-out.
Without modifying the register value, it is possible with cmp, though: 0x7f - x has unsigned wrapping (i.e. borrow output that sets CF) for x>=0x80 unsigned.  i.e. for values with their MSB set.
   xor dx, dx              ; count = 0
   mov si, OFFSET array    ; LEA takes more bytes than mov-immediate.  Never use LEA without a register, except for x86-64 RIP-relative
;;;  The interesting part
   mov  al, 0x7f           ; outside the loop
back:                      ; do {
   cmp  al, [si]             ; CF = 0x7F <(unsigned)[SI].  i.e. MSB set in [si]
   adc  dx, 0                ; count negative values
;;;  then the rest of the loop
   inc  si
   cmp  si, OFFSET array+8   ; the LOOP instruction isn't fast on most modern CPUs, and we're hard-coding the array length anyway.  Or just put a label at the end of it and use that.
   jne  back                ; }while(p != endp) 
You don't need clc in this or your version.  CF isn't "sticky"; anything that updates its value sets it to 0 or 1 regardless of the old value.  And it's not an input for cmp.
We can't set CF=1 for bl < 0 (aka bl >= 0x80U) with cmp bl, constant, unfortunately.  It only works the way you're doing it, setting another register to compare against.  (cmp reg, 123 exists, cmp 123,reg doesn't; most 2-operand instructions modify their destination and wouldn't make sense with an immediate destination, so it would be a special case to have yet another opcode for cmp in the other direction.)
But you can do cmp bl, 0x80 to clear CF when bl < 0x80, i.e. when its sign bit isn't set.
   cmp  byte ptr [si], 0x80        ; CF = [si] < (unsigned)0x80, i.e. non-negative
   sbb  dx, -1                     ; count when CF=0, negative values
Loading the value into a register with mov bl, [si] can be helpful for debugging, making it show up in your debugger's window of registers instead of having to examine memory.  But that's not necessary; cmp works with reg or memory operands (or an immediate), saving an instruction.
As a further optimization for code-size inside the loop, scasb is equivalent to cmp al, es:[di] / inc di (but the inc part doesn't set FLAGS.)  And it's actually dec di if DF is set, so you'd want cld somewhere in your program before a loop using "string" instructions to make sure they go in the forward direction.
Using scasb means you need to use AL for that.  Without scasb, you could count into AL inside the loop, where it could be the exit status for your DOS call.  (Perhaps that's why you were trying to use AL=0, if you wanted to exit(0) instead of returning a value.)
scasb isn't particularly fast on modern CPUs, but it is on real 8086; so is the loop instruction, because they're both compact code-size.  loop is a special-case optimization for dec cx/jnz (but also without affecting FLAGS).
Or with 386 instructions, bt word ptr [si], 7 to Bit Test that bit, putting the result in CF where you can add dx, 0.  bt is slow on modern CPUs with bt mem, reg (like 10 uops) because it can index outside the word indexed by the addressing mode. So it would be less efficient put bt word ptr [array], cx in a loop with cx initially = 7 and incrementing with add cx, 8 inside the loop.  But that would work.
bt is not too bad with bt mem, imm, only 2 uops on most modern Intel and 1 on some AMD (https://uops.info/).  It's only a single uop for bt reg, imm or bt reg,reg, like cmp, if you want to load first.  (It can't macro-fuse with branches into a single uops, so if branching instead of adc, a cmp/jle would be more efficient as well as more readable.)  On AMD, bts/btr/btc to also modify the bit are slower than bt even for reg,reg, decoding to extra uops.
SSE2 + popcnt to check 4, 8, or 16 bytes at once
The extra fun way, since you have exactly 8 bytes, uses SSE2 and popcnt.  (Yes this can work in 16-bit real mode, unlike AVX.  In a bootloader and maybe DOS you'd have to manually enable the control-register bits that make SSE instructions not fault.  Of course it only works on CPUs with popcnt, like Nehalem and later from 2008 or so, otherwise use pcmpgtb / psadbw / movq for just SSE2, or SSE1 using MMX registers.)
  movq      xmm0, qword ptr [array]  ; load 8 bytes (zero-extending to a 16-byte XMM reg)
  pmovmskb  ax, xmm0                 ; pack the sign bit of each byte into an integer reg
  popcnt    ax, ax                   ; count set bits = sign bits of the bytes
Would also work easily for 4 or 16 byte arrays, or for other compile-time-constant sizes, do 2 loads and shift out overlapping bytes.
For other element sizes, there's movmskps (dword) and movmskpd (qword)
With a larger array, you'd want to start accumulating counts in vector regs, like pcmpgtb to compare for 0 > x / psubb xmm1, xmm0 to do total -= (0 or -1), up to 255 iterations of 16 bytes.  Then accumulate with psadbw against zero.  Same problem as How to count character occurrences using SIMD but replacing pcmpeqb with pcmpgtb.