I'm coding a crawler which retrieves some Facebook posts and serialize them as XML.
My problem is the following: I've found that some messages with some special characters (such as \b), when I wrote it to my XML are serialized as 
If I try to open back this XML with Java DOM parser (with the ), I obtain an error because it is not capable to parse this character.
How can I solve it?
Data examples: http://pastebin.com/3xEK5QbV
The error given by the parser when I load the resulting XML is:
[Fatal Error] out.xml:7:59: La referencia de caracteres "&# org.xml.sax.SAXParseException; systemId: file:/Z:/Programas/Workspace%20Eclipse/workspace/Test/out.xml; lineNumber: 7; columnNumber: 59; La referencia de caracteres "&# at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) at Test.loadBadXML(Test.java:43) at Test.(Test.java:32) at Test.main(Test.java:139)
About source code I've three related source codes:
First one: Obtaining "malformed (with \b)" data from JSON from facebook:
// post is the object which contains the "post"
// URL_BASE_GRAPH, and TOKEN are constants which contains Strings necessary to create the URL for Facebook graph API
// idPost is the ID of the post that I'm retrieving
String urlStr = URL_BASE_GRAPH + idPost + "?access_token=" + TOKEN;
URL url = new URL(urlStr);
ObjectMapper om = new ObjectMapper();
JsonNode root = om.readValue(url.openStream(), JsonNode.class);
...    
JsonNode message = root.get("message");
if (message != null) {
        post.setMessage(message.asText());
}
Second one: Writing this data as XML:
// outFile is the file to be written
                File file = new File(outFile);
                DocumentBuilderFactory docFactory = DocumentBuilderFactory
                                .newInstance();
                DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
                // root elements
                Document doc = docBuilder.newDocument();
                Element rootElement = doc.createElement("groups");
                doc.appendChild(rootElement);
                ....
                if (post.getMessage() != null) {
                        Element messagePost = doc.createElement("post_message");
                        // I've tried also this: messagePost.appendChild(doc.createTextNode(StringEscapeUtils.escapeXml(post.getMessage())));
                        messagePost.appendChild(doc.createTextNode(post.getMessage()));
                        postEl.appendChild(messagePost);
                }
                ....
                TransformerFactory transformerFactory = TransformerFactory.newInstance();
                Transformer transformer = transformerFactory.newTransformer();
                transformer.setOutputProperty(OutputKeys.INDENT, "yes");
                transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
                DOMSource source = new DOMSource(doc);
                StreamResult result = new StreamResult(file);
                transformer.transform(source, result);
Third one: Loading again the XML (malformed with ) from the XML:
 File fXmlFile = new File(f);
                DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
                DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
                Document doc = dBuilder.parse(fXmlFile);
                doc.getDocumentElement().normalize();
                ....
                Node pstNode = postNode.item(j);
                if (pstNode.getNodeType() == Node.ELEMENT_NODE) {
                        Element pstElement = (Element) pstNode;
                        String pstMessage = null;
                        if (pstElement.getElementsByTagName("post_message").item(0) != null)
                                pstMessage = pstElement.getElementsByTagName("post_message").item(0).getTextContent();
Any thoughts?
Thanks!
 
     
    