I'm currently trying to put text from many small files in one big file using java. This big file is further used in a python module to extract phrases from it. During this process, I get an error indicating invalid utf8 text. Some research brought me to this error in java, but it didnt solve my problem.
Strangely, when I type the sentence in a online converter for utf8 like this one, it also say's error. The string I used is "Brawlers Were Back On Ice and Canvas".
Can anyone explain to me why this happens?
Thanks in advance!
EDIT/UPDATE It looks like this online tool might have a bug. Im still on a fix with the problem of using the file in python, so I'll show the code to create it:
     Writer writer = new BufferedWriter(new OutputStreamWriter(
                  new FileOutputStream("samplefile"), "utf-8"))) {
     writer.write(someText);
But this produces errors in python like
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 0: unexpected end of data 
SecondEdit: The python code to process the data:
dr = DirRunner(self.dir)
    for item in dr:
        #open for reading using a buffer
        file = open(item, "r", 1);
        for line in file.readlines():
            yield line
DirRunner just returns a list of all the files and folders in one directory.
Each line is then processed in this function:
def any2utf8(input):
"""
 convert a string or object into utf8 encoding
 source: http://stackoverflow.com/questions/13101653/python-convert-complex-dictionary-of-strings-from-unicode-to-ascii
 usage: 
    str = "abc"
    str_replace = any2utf8(str)
"""
if isinstance(input, dict):
    return {any2utf8(key): any2utf8(value) for key, value in input.iteritems()}
elif isinstance(input, list):
    return [any2utf8(element) for element in input]
elif isinstance(input, unicode):
    return input.encode('utf-8')
else:
    return input
