Here is the Python 2.5 code (which replace the word fox with a link<a href="/fox">fox</a>, and it avoided the replacement inside a link):
import re
content="""
<div>
    <p>The quick brown <a href='http://en.wikipedia.org/wiki/Fox'>fox</a> jumped over the lazy Dog</p>
    <p>The <a href='http://en.wikipedia.org/wiki/Dog'>dog</a>, who was, in reality, not so lazy, gave chase to the fox.</p>
    <p>See "Dog chase Fox" image for reference:</p>
    <img src='dog_chasing_fox.jpg' title='Dog chasing fox'/>
</div>
"""
p=re.compile(r'(?!((<.*?)|(<a.*?)))(fox)(?!(([^<>]*?)>)|([^>]*?</a>))',re.IGNORECASE|re.MULTILINE)
print p.findall(content)
for match in p.finditer(content):
  print match.groups()
output=p.sub(r'<a href="/fox">\3</a>',content)
print output
The output is:
[('', '', '', 'fox', '', '.', ''), ('', '', '', 'Fox', '', '', '')]
('', '', None, 'fox', '', '.', '')
('', '', None, 'Fox', None, None, None)
Traceback (most recent call last):
  File "C:/example.py", line 18, in <module>
    output=p.sub(r'<a href="fox">\3</a>',content)
  File "C:\Python25\lib\re.py", line 274, in filter
    return sre_parse.expand_template(template, match)
  File "C:\Python25\lib\sre_parse.py", line 793, in expand_template
    raise error, "unmatched group"
error: unmatched group
- I am not sure why the backreference - \3wont work.
- (?!((<.*?)|(<a.*?)))(fox)(?!(([^<>]*?)>)|([^>]*?</a>))works see http://regexr.com?317bn , which is surprising. The first negative lookahead- (?!((<.*?)|(<a.*?)))puzzles me. In my opinion, it is not supposed to work. Take the first match it finds,- foxin- gave chase to the fox.</p>, there is a- <a href='http://en.wikipedia.org/wiki/Dog'>dog</a>where matches- ((<.*?)|(<a.*?)), and as a negative lookahead, it should return a FALSE. I am not sure I express myself clearly or not.
Thanks a lot!
(Note: I hate using BeautifulSoup. I enjoy writing my own regular expression. I know many people here will say Regular expression is not for HTML processing blah blah. But this is a small program, so I prefer Regular expression over BeautifulSoup)
 
     
    