I'm not having any luck with pyPDF2 or PDFMiner.  The tools always return _______________ for the textboxes even if they are filled in.  Does anyone have any idea on how to extract the text within the textbox fields?
            Asked
            
        
        
            Active
            
        
            Viewed 2,540 times
        
    2
            
            
         
    
    
        Sraw
        
- 18,892
- 11
- 54
- 87
 
    
    
        Brian Skeels
        
- 31
- 1
- 2
- 
                    What did you try with pyPDF2/PDFMiner? what did it return? – Jesse May 25 '18 at 01:14
- 
                    https://stackoverflow.com/questions/15583535/how-to-extract-text-from-a-pdf-file-in-python, https://stackoverflow.com/questions/34129936/python-extract-text-from-pdfs, https://stackoverflow.com/questions/26494211/extracting-text-from-a-pdf-file-using-pdfminer-in-python – Jesse May 25 '18 at 01:15
1 Answers
0
            
            
        You need to extract text fields, not a text. So you need something like this:
import sys
import six
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdftypes import resolve1
fp = open("c:\\tmp\\test.pdf", "rb")
parser = PDFParser(fp)
doc = PDFDocument(parser)
fields = resolve1(doc.catalog["AcroForm"])["Fields"]
for i in fields:
    field = resolve1(i)
    name, value = field.get("T"), field.get("V")
    print ("{0}:{1}".format(name,value))
 
    
    
        A.Andruhovski
        
- 79
- 4