I'm a bit confused about both instructions. First let's discard the special case when the scanned value is 0 and the undefined/bsr or bitsize/lzcnt result - this difference is clear and not part of my question.
Let's take the binary value 0001 1111 1111 1111 1111 1111 1111 1111
According to Intel's spec the result for lzcnt is 3
According to Intel's spec the result for bsr is 28
lzcnt counts, bsr returns the index or distance from bit 0 (which is the lsb).
How can both instructions be the same and how can lzcnt be emulated as bsr in case there's no BMI on the CPU available? Or is bit 0 in case of bsr the msb? Both "code operations" in Intel's spec are different too, one counts or indexes from the left, the other from the right.
Maybe someone can shed some light on this, I have no CPU without BMI/lzcnt instruction to test if the fallback to bsr works with the same result (as the special case of value 0 to scan never happens).