We hav some text containing german umlauts represented using e.g. 'a' + COMBINING DIAERESIS ($cc $88).
Any idea how to convert such text properly to utf8?
We hav some text containing german umlauts represented using e.g. 'a' + COMBINING DIAERESIS ($cc $88).
Any idea how to convert such text properly to utf8?
First, if it's not already a unicode then decode it. Second, unicodedata.normalize(). Third, encode.