Need to parse a PDF file in order to extract just the first initial lines of text, and have looked for different Python packages to do the job, but without any luck.
Having tried:
- PDFminer, PDFminer.six and PDFminer3k, which appears to be overly complex for the simple job, and I was unable to find a simple working example 
- slate, got error in installation, though worked with fix from thread, but got error when trying; maybe using wrong PDFminer, but can't figure which to use 
- tika, that gave different terminal error messages and was very slow 
- pdftotext failed to install 
- pdf2text failed at "import pdf2text", and when changed to "pdftotext" failed to import with "ImportError: cannot import name 'Extractor'" even through - pip listshows that "Extractor" is installed
Usually I find that installed Python packages work amazingly well, but parsing PDF to text appears to be a jungle, which the myriad of tools also indicates.
Any suggestion of how to do simple parsing of a PDF file to text in Python?
PyPDF2 example added
An example of PyPDF2 is:
import PyPDF2
pdfFileObj = open('file.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pageObj_0 = pdfReader.getPage(0)
print(pageObj_0.extractText())
Which returns garbage as:
$%$%&%&$'(' ˜!)"*+#
 
    