I'd like to fetch a webpage and save the content as a string? Is there a library to do that? I want to use the string for some a program I am building. It's for websites, that don't necessarily provide rss feed.
            Asked
            
        
        
            Active
            
        
            Viewed 1,242 times
        
    0
            
            
        - 
                    3[Apache HttpClient](http://hc.apache.org/) – Luiggi Mendoza Jul 03 '13 at 15:59
- 
                    3I can't possibly you believe you didn't find one. I don't believed you even began to search. [First Google result of 'java fetch webpage'](http://stackoverflow.com/questions/238547/how-do-you-programmatically-download-a-webpage-in-java) – Anti Earth Jul 03 '13 at 16:00
- 
                    @user2516730 you should flag the question as duplicate. – Luiggi Mendoza Jul 03 '13 at 16:02
- 
                    Probably HtmlUnit might help you. – user902691 Jul 03 '13 at 16:05
- 
                    @antiearth thanks, maybe the keyword i used to search was not accurate to the problem I was having. – Jul 03 '13 at 16:05
- 
                    @LuiggiMendoza It would be nice if you provide an example. Really. – giannis christofakis Jul 03 '13 at 16:17
- 
                    @yannishristofakis if you access to the links I've provided, there are lot of examples to accomplish the task asked by OP. Really. – Luiggi Mendoza Jul 03 '13 at 16:37
- 
                    @LuiggiMendoza Ok,thnx. – giannis christofakis Jul 03 '13 at 16:38
3 Answers
3
            i think you need this
URL url = new URL("http://www.google.com/");
URLConnection con = url.openConnection();
InputStream in = con.getInputStream();
String encoding = null; // con.getContentEncoding(); *** WRONG: should use "con.getContentType()" instead but it returns something like "text/html; charset=UTF-8" so this value must be parsed to extract the actual encoding
encoding = encoding == null ? "UTF-8" : encoding;
String body = IOUtils.toString(in, encoding);
System.out.println(body);
- 
                    Be careful, `con.getContentType()` should be used instead of `con.getContentEncoding()`, but it returns something like `"text/html; charset=UTF-8"` so this value must be parsed in order to extract the actual encoding (I've added a comment on the code above to reflect this) – xav Aug 21 '16 at 05:25
- 
                    See http://stackoverflow.com/questions/5938007/what-is-the-difference-between-content-type-charset-x-and-content-encoding-x concerning my previous comment (`con.getContentEncoding()` is used for things like "gzip", "compress", ... not encoding) – xav Dec 21 '16 at 16:12
0
            
            
        You can use Apache HttpComponents
    CloseableHttpClient httpclient = HttpClients.createDefault();
    HttpGet httpget = new HttpGet("http://www.google.gr");
    try (CloseableHttpResponse response = httpclient.execute(httpget)) { 
        HttpEntity entity = response.getEntity();
        if (entity != null) {
           System.out.println(EntityUtils.toString(entity));
        }
        response.close();
    } catch (IOException ex) {
        Logger.getLogger(HttpClient.class.getName()).log(Level.SEVERE, null, ex);
    }
 
    
    
        giannis christofakis
        
- 8,201
- 4
- 54
- 65
- 
                    Hello. Do you know if this is slower or faster than the accepted answer? – dentex May 28 '14 at 18:47
- 
                    
- 
                    @dentex I don't think you gain to much in performance. Don't forget you want to access a remote resource,so it's not up to you how fast your result will come,if speed is what concerns you. `Apache HttpComponents` gives you much more functionality like asynchronous calls. It's up to you. – giannis christofakis May 30 '14 at 13:46
 
     
    