we already fetched the URLs and stored in the db using jsoup lib.Now we are looking to extract the data and store in db,but we are looking only specific field,rather than storing the whole page. for example :http://www.flipkart.com/shoes/ when we fetch this link, we need field like brands ,prices, reviews etc.. using java code!! Please help !
            Asked
            
        
        
            Active
            
        
            Viewed 509 times
        
    1 Answers
-2
            
            
        There are two ways you can filter out the whole content,
- Apply Regexon the response content and extract the needed fields.
- Using xpathyou can extract the needed fields (Preferred and recommended way of parsing).
Ex: 1 - Regex
- Generate the regexpattern for your selected page.
- Get the response as Stringand apply the pattern and retrieve the data.
Ex: 2 - XPath
- Identify the methodolgy to locate each and every html element uniquely (Or list)
- Get the response as html/xmlform and apply thexpathon the retrieved content and get the data.
 
    
    
        Vikrant Kashyap
        
- 6,398
- 3
- 32
- 52
 
    
    
        Hakuna Matata
        
- 755
- 3
- 13
- 
                    1Regex should not be used to parse html. http://stackoverflow.com/a/6751339/1176178 – Zack Aug 02 '16 at 13:08
