How can a scanned document have multiple layers in resulting PDF?

Question

A colleague and I find ourselves at odds about the details of a transaction. As proof to support his claims, I asked him to send me a copy of an invoice form he received with an order. The colleague says he used a scanner that was part of a large multi-function copier when he scanned the invoice to a PDF document.

Upon receiving the PDF document, I thought a few things about the scan looked unusual. In an attempt to look a bit closer, I decided to open the document in my copy of Adobe Photoshop CS5. Immediately upon opening, I noticed that the document has several layers. A background layer for the colorfully watermarked background of the invoice, another layer holds most of the static format of the text that is common to all of the invoices from this company. Yet another layer holds most of the text that changes per order, and another layer with the signature of the shipping manager from the warehouse.

I know some scanners can use OCR (optical character recognition) to embed extra information in a PDF so it can be searched and edited, but I had never seen the information from a scan broken out into multiple layers in the document like that. My question is: In what ways could any scanner separate the contents of a scanned physical document into multiple layers in a PDF file?

score 1 · Answer 1 · answered Jan 19 '17 at 17:27

I tend to lean towards practical solutions. Here, you want to know whether what you received is authentic or not.

So, discreetly find the make and the model of the multi-function device. Then:

Post it here. One of us might know what it can do and what it can't do.
Contact the manufacturer. Start with their website, then maybe an online chat or phone call. They'll tell you what it can do.
If you have social skills, find a shop that sells the device and ask the shop keeper show you what it can do.

score 0 · Answer 2 · answered Jan 19 '17 at 17:02

I believe the "layers" you're seeing are called "annotations" in the PDF specifications: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf

It seems the scanner created with PDF with an image of the document and annotated it with text from OCR, and a watermark. Having the signature there as a separate annotation seems strange to me.

How can a scanned document have multiple layers in resulting PDF?

2 Answers2