I'm looking for a way to export the annotation layer of a PDF and merge it back in another PDF. I've tried using libraries like poppler and PyPDF2 but nothing worked so far. Are there any open-source libraries that could do what I want?
            Asked
            
        
        
            Active
            
        
            Viewed 684 times
        
    1 Answers
0
            
            
        Disclaimer: I am the author of pText the library used in this example.
pText converts a PDF document to an internal JSON-like representation of nested lists, dictionaries and primitives. That means your question comes down to copying a dictionary from one JSON object to another. Should be pretty easy.
You would need to read the first document:
doc_in_a = None
with open("input_a.pdf", "rb") as in_file_handle:
    doc_in_a = PDF.loads(in_file_handle)
Then you would need to read the second document:
doc_in_b = None
with open("input_b.pdf", "rb") as in_file_handle:
    doc_in_b = PDF.loads(in_file_handle)
And then add all annotations from a to b:
annots = doc_in_a.get_page(0).get_annotations()
doc_in_b.get_page(0)[Name("Annots")] = List()
for a in annots:
    doc_in_b.get_page(0)["Annots"].append(a)
Finally, write pdf b:
with open("output.pdf", "wb") as out_file_handle:
    PDF.dumps(out_file_handle, doc_in_b)
You can obtain pText either on GitHub, or using PyPi There are a ton more examples, check them out to find out more about working with images.
        Joris Schellekens
        
- 8,483
 - 2
 - 23
 - 54
 
- 
                    Would this load the whole PDF content in memory? – Bordaigorl Mar 17 '21 at 16:13