Is there any way to preprocess text files and skip these characters?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa1 in position 1395: invalid start byte
Is there any way to preprocess text files and skip these characters?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa1 in position 1395: invalid start byte
 
    
    Try this:
str.decode('utf-8',errors='ignore')
 
    
    I think your text file have some special character, so 'utf-8' can't decode.
You need to try using 'ISO-8859-1' instead of 'utf-8'. like this:
   import sys
   reload(sys).setdefaultencoding("ISO-8859-1")
   # put your code here
