From my understanding,
(.)(?<!\1)
should never match. Actually, php's preg_replace even refuses to compile this and so does ruby's gsub. The python re module seems to have a different opinion though:
import re
test = 'xAAAAAyBBBBz'
print (re.sub(r'(.)(?<!\1)', r'(\g<0>)', test))
Result:
(x)AAAA(A)(y)BBB(B)(z)
Can anyone provide a reasonable explanation for this behavior?
Update
This behavior appears to be a limitation in the re module. The alternative regex module seems to handle groups in assertions correctly:
import regex
test = 'xAAAAAyBBBBz'
print (regex.sub(r'(.)(?<!\1)', r'(\g<0>)', test))
## xAAAAAyBBBBz
print (regex.sub(r'(.)(.)(?<!\1)', r'(\g<0>)', test))
## (xA)AAA(Ay)BBB(Bz)
Note that unlike pcre, regex also allows variable-width lookbehinds:
print (regex.sub(r'(.)(?<![A-Z]+)', r'(\g<0>)', test))
## (x)AAAAA(y)BBBB(z)
Eventually, regex is going to be included in the standard library, as mentioned in PEP 411.