I've come across a problem that seems really weird to me.
I'm scraping a website using Jsoup:
Elements names = doc.select(".Mod.Thm-inherit").select("h3");
for (Element e : names) {
System.out.println(e.text());
}
My output is (Fantasy hockey team names, names changed for simplicity):
Team One ?
Team Two ?
Team Three ?
Team Four ?
Team Five ?
//etc
Now the actual team names don't have the extra space or question mark. Thinking I could just replace it, I tried:
String str = e.text().replaceAll("\\?", "");
System.out.println(str);
This however still outputs the question mark at the end. I'm thinking that this might mean that it's a character that Eclipse/Java doesn't recognize. (Note: It doesn't display a �, it's really just the generic ?)
When looking at the HTML code, there are no extra characters though:
<script charset="utf-8" type="text/javascript" language="javascript">
<!-- Bunch of HTML -->
<div class="Grid-u-1-2 Pend-xl"><h3 class="My-xl Ta-c Fz-lg"><a href="/hockey/27381/1">Team One</a>
Anyone know why this is happening?
Edit: I was quickly able to solve the issue by just doing a substring and removing the last 2 characters, but I'd still like to know why it's happening.
Edit2: Playing around with it more, I found that if I (int) cast the question mark, it gives me 57399, instead of ?'s regular 63. So definitely some sort of unknown character issue. Just not sure why it's being added or what that character is supposed to represent.