I have a simple problem that is driving me crazy, and seems to be due to the handling in python of unicode characters.
I have latex table stored on my disk (very similar to http://www.jwe.cc/downloads/table.tex), and I want to apply some regex on it so that hyphens - (\u2212) are replaced by en-dashes – (alt 0150 or \u2013)
I am using the following function that performs two different regex-and-replace.
import re
import glob
def mychanger(fileName):
  with open(fileName,'r') as file:
    str = file.read()
    str = str.decode("utf-8")
    str = re.sub(r"((?:^|[^{])\d+)\u2212(\d+[^}])","\\1\u2013\\2", str).encode("utf-8")
    str = re.sub(r"(^|[^0-9])\u2212(\d+)","\\1\u2013\\2", str).encode("utf-8")
  with open(fileName,'wb') as file:
    file.write(str)
myfile = glob.glob("C://*.tex")
for file in myfile: mychanger(file)  
Unfortunately, this does not change anything.
It works though, if I use a non unicode character like $ instead of \u2013, which means the regex code is correct. 
I am lost here, I tried using re.sub(ur"((?:^|[^{])\d+)\u2212(\d+[^}])","\\1\u2013\\2", str).encode("utf-8") but it still does not change anything.
What is wrong here? Thanks!
 
    