I have a Django project that creates PDFs using Java as a background task. Sometimes the process can take awhile, so the client uses polling like this:
- The first request starts the build process and returns 
None. - Each subsequent request checks to see if the PDF has been built.
- If it has been, it returns the PDF.
 - If it hasn't, it returns 
Noneagain and the client schedules another request to check again in n seconds. 
 
The problem I have is that I don't know how to check if the PDF is finished building. The Java process creates the file in stages. If I just check if the PDF exists, then the PDF that gets returned is often invalid, because it is still being built. So, what I need is an is_pdf(path_to_file) function that returns True if the file is a valid PDF and False otherwise.
I'd like to do this without a library if possible, but will use a library if necessary.
I'm on Linux.
Here is a solution that works using pdfminer, but it seems like overkill to me.
from pdfminer.high_level import extract_text
def is_pdf(path_to_file):
    """Return True if path_to_file is a readable PDF"""
    try:
        extract_text(path_to_file, maxpages=1)
        return True
    except:
        return False
I'm hoping for a solution that doesn't involve installing a large library just to check if a file is a valid PDF.