I've been parsing some docx files (UTF-8 encoded XML) with special characters (Czech alphabet). When I try to output to stdout, everything goes smoothly, but I'm unable to output data to the file,
Traceback (most recent call last):
File "./test.py", line 360, in
ofile.write(u'\t\t\t\t\t\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 37: ordinal not in range(128)
Although I explicitly cast the word variable to unicode type (type(word) returned unicode), I tried to encode it with .encode('utf-8) I'm still stuck with this error.
Here is a sample of the code as it looks now:
for word in word_list:
    word = unicode(word)
    #...
    ofile.write(u'\t\t\t\t\t<feat att="writtenForm" val="'+word+u'"/>\n')
    #...
I also tried the following:
for word in word_list:
    word = word.encode('utf-8')
    #...
    ofile.write(u'\t\t\t\t\t<feat att="writtenForm" val="'+word+u'"/>\n')
    #...
Even the combination of these two:
word = unicode(word)
word = word.encode('utf-8')
I was kind of desperate so I even tried to encode the word variable inside the ofile.write()
ofile.write(u'\t\t\t\t\t<feat att="writtenForm" val="'+word.encode('utf-8')+u'"/>\n')
I would appreciate any hints of what I'm doing wrong.
 
     
     
     
     
     
    