I am parsing some HTML and sometimes I get some characters like é when I read the data, doc = urllib2.urlopen(url).read(), how can I find and replace these characters with there non accent equivalent?
The variable doc is a byte string, I have tried to convert it to unicode string like this
doc = doc.encode('utf-8')
doc = strip_accents(doc)
doc = doc.decode('utf-8')
Where strip_accents is
def strip_accents(s):
    return ''.join(c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn')
From this question What is the best way to remove accents in a Python unicode string?
But I get error
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa9 in position 161: ordinal not in range(128)
When I try to encode doc
How can I change the accented to non accented characters?
 
    