I'm using Jsoup's parseBodyFragment() and parse() methods to work with blocks of code made up of script, noscript, and style tags. The goal isn't to clean them - just to select(), analyze, and output them. The select() portion works really well.
However, the issue is that it's automatically encoding the url parameters of src attributes. So, when the input is this:
<noscript>
<img height="1" width="1" style="display:none;" alt="" src="https://something.orother.com/i/cnt?txn_id=123&p_id=123"/>
</noscript>
I end up with this, returned from Jsoup, via the outerHTML() method:
<noscript>
<img height="1" width="1" style="display:none;" alt="" src="https://something.orother.com/i/cnt?txn_id=123&p_id=123"/>
</noscript>
The issue being the standard ampersand (&) in the url parameter is being encoded and output as &. Is there a way to disable this?
I'm looking for a way to get the html of the selected element without modification. Thanks!
Update (2/23/2016): Clarified problem. Also, found an issue on the Github repo describing the problem: https://github.com/jhy/jsoup/issues/372. Looks like this might not be possible.