I am new in scraping. I am trying, to scrape data from a site using JSOUP. I want to scrape data in from tags like <div>, <span>, <p> etc. Can anybody tell me how to do this?
Asked
Active
Viewed 7,844 times
-3
Pshemo
- 122,468
- 25
- 185
- 269
Muhammad Waqas
- 23
- 1
- 6
-
3Please tell us, what you have tried so far, SO is not the place for getting code magically. – Zhedar May 10 '15 at 17:05
-
1http://jsoup.org/cookbook/ – Jeffrey Bosboom May 10 '15 at 17:11
-
i have just made a new project and added a jsoup jar file and established a connection. i am actually new to this. i want to scrap data residing in different tags as i have shown above. plzzz help me – Muhammad Waqas May 10 '15 at 17:26
1 Answers
2
Check this. A basic example:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class Test {
public static void main(String[] args) throws Exception {
String url = "https://stackoverflow.com/questions/2835505";
Document document = Jsoup.connect(url).get();
String text = document.select("div").first().text();
System.out.println(text);
Elements links = document.select("a");
for (Element link : links) {
System.out.println(link.attr("href"));
}
}
}
This will first print the text of the first div on the page, and then print out all the url of all links (a) on the page.
To get div's with specific class, do Elements elements = document.select("div.someclass")
To get divs with a specific id, do Elements elements = document.select("div#someclass")
If you want to go through all the selected elements, do this:
for (Element e:elements) {
System.out.println(e.text());
//you can also do other things.
}
Community
- 1
- 1
Jonas Czech
- 12,018
- 6
- 44
- 65
-
thanks JonasCz dear this was about first what about other div's and div with particular class names and ids – Muhammad Waqas May 10 '15 at 19:38
-
@MuhammadWaqas, If my answer helped you, it would be nice to _accept_ it by clicking the checkmark next to it :-) – Jonas Czech May 11 '15 at 11:26