i am trying to parse an MHT-Document using Jsoup (Version: 1.7.3). The goal is to open two files and merge them together (joining head and body) to get one complete file. But firstly i got problems parsing the mht file because the parsed result has an significant lag of information and can´t be opened after parsing. What I did is the following:
- Create a mht file using Word (containing one image and some text)
- Parse it to String using Jsoup
- Write the string to a file
- Open the file and the file is broken
I used the following code:
private static final String USED_CHARSET = "windows-1252";
private static final String PATH = "C:\\Test\\";
private static final Charset CHARSET = Charset.forName(USED_CHARSET);
@Test
public void test() throws IOException {
    Document doc = Jsoup.parse(new File(PATH, "sourceMht.mht"),
            USED_CHARSET);
    writeDoc(new File(PATH, "parsedMht.mht"), doc.html());
}
private void writeDoc(File file, String html) throws IOException {
    Writer out = new BufferedWriter(new OutputStreamWriter(
            new FileOutputStream(file), CHARSET));
    try {
        out.write(html);
    } finally {
        out.flush();
        out.close();
    }
}
Thanks for your help.
 
     
     
    