is there any way how to covert PDF to HTML? I need a text from the file and when I tried PDFtoText library, I got the text, but unsorted and without any rules for parsing. I noticed, that some PDFtoHTML online services works great with the file. So, any tips please? Here is the PDF file and I need only one specific row in the right column.
            Asked
            
        
        
            Active
            
        
            Viewed 500 times
        
    2 Answers
0
            Try integrating the PDFtoHTML from the poppler project; that should support table recognition.
        A T
        
- 13,008
 - 21
 - 97
 - 158
 
0
            
            
        pdftohtml works fine : fast, stable but the html result is ugly at best. I have used it for quite some time for a web site that has many job resumes.
It is a good solution for extracting textual content however.
I would give the scribd API a try
http://www.scribd.com/developers/api
or the google apps document API. GOogle does a great job a displaying and converting pdf files
        Mohit Bumb
        
- 2,466
 - 5
 - 33
 - 52