It's generally not worth using SSE4.1 ptest xmm0,xmm0 on a pcmpeqb result, especially not if you're branching.
pmovmskb is 1 uop, and cmp or test can macro-fuse with jnz into another single uop on both Intel and AMD CPUs. Total of 2 uops to branch on a pcmpeqb result with pmovmsk + test/jcc
But ptest is 2 uops, and its 2nd uop can't macro-fuse with a following branch. Total of 3 uops to branch on a vector with ptest + jcc.
It's break-even when you can use ptest directly, without needing a pcmp, e.g. testing any / all bits in the whole vector (or with a mask, some bits). And actually a win if you use it for cmov or setcc instead of a branch. It's also a win for code-size, even though same number of uops.
You can amortize the checking over multiple vectors. e.g. por some vectors together and then check that all of the bytes zero. Or pminub some vectors together and then check for any zeros. (glibc string functions like strlen and strchr use this trick to check a whole cache-line of vectors in parallel, before sorting out where it came from after leaving the loop.)
You can combine pcmpeq results instead of raw inputs, e.g. for memchr. In that case you can use pand instead of pminub to get a zero in an element where any input has a zero. Some CPUs run pand on more ports than pminub, so less competition for vector ALU.
Also note that pmovmskb zero-extends into EAX; you can test eax,eax instead of wasting a prefix byte to only test AX.