I'm trying to extract text from a large number of PDFs using PDFMiner python bindings. The module I wrote works for many PDFs, but I get this somewhat cryptic error for a subset of PDFs:
ipython stack trace:
/usr/lib/python2.7/dist-packages/pdfminer/pdfparser.pyc in set_parser(self, parser)
    331                 break
    332         else:
--> 333             raise PDFSyntaxError('No /Root object! - Is this really a PDF?')
    334         if self.catalog.get('Type') is not LITERAL_CATALOG:
    335             if STRICT:
PDFSyntaxError: No /Root object! - Is this really a PDF?
Of course, I immediately checked to see whether or not these PDFs were corrupted, but they can be read just fine.
Is there any way to read these PDFs despite the absence of a root object? I'm not too sure where to go from here.
Many thanks!
Edit:
I tried using PyPDF in an attempt to get some differential diagnostics. The stack trace is below:
In [50]: pdf = pyPdf.PdfFileReader(file(fail, "rb"))
---------------------------------------------------------------------------
PdfReadError                              Traceback (most recent call last)
/home/louist/Desktop/pdfs/indir/<ipython-input-50-b7171105c81f> in <module>()
----> 1 pdf = pyPdf.PdfFileReader(file(fail, "rb"))
/usr/lib/pymodules/python2.7/pyPdf/pdf.pyc in __init__(self, stream)
    372         self.flattenedPages = None
    373         self.resolvedObjects = {}
--> 374         self.read(stream)
    375         self.stream = stream
    376         self._override_encryption = False
/usr/lib/pymodules/python2.7/pyPdf/pdf.pyc in read(self, stream)
    708             line = self.readNextEndLine(stream)
    709         if line[:5] != "%%EOF":
--> 710             raise utils.PdfReadError, "EOF marker not found"
    711 
    712         # find startxref entry - the location of the xref table
PdfReadError: EOF marker not found
Quonux suggested that perhaps PDFMiner stopped parsing after reaching the first EOF character. This would seem to suggest otherwise, but I'm very much clueless. Any thoughts?
 
     
     
     
     
     
     
    