1

I would like to scan old text documents, and then destroy some of the originals.

Apart from spot-checking, what can I do to get an acceptably low scan failure rate? I would like to get a failure rate below perhaps 0.25% (after spot checks). I count as failures pages that are missed or are not legible.

This seems a difficult target to achieve. What can I do to reduce the rate of failures in the first place, so that I have less checking to do?

Related question (this question is about "QA" i.e. preventing failures, the linked question is about "QC" i.e. detecting failures): How to verify scanned page count and quality when using sheet feeder?

Croad Langshan
  • 878
  • 9
  • 22

2 Answers2

1

To reduce your error rate with very diverse documents (as you stated in What features are important in a scanner + sheet feeder for old personal documents):

(A) The "simple" answer: 1. Sort your documents into batches of equal document characteristics. 2. For each batch do test scans with varying scanner driver settings. Do this until you find a set of driver settings that produces scans with your intended failure rate of "below perhaps 0.25%" within the test sample. 3. Use these driver settings and scan the rest of your batch. 4. Do spot checks to verify whether your scan results are within your intended failure rate. 5. If you get a higher failure rate: either go back to step 2 and fine-tune your driver settings with a new test sample or go back to step 1 and divide your batch into separate batches with each their own scanner driver settings.

(B) With (A) you should be able to reach your intended failure rate with very simple documents i.e. plain black one-sided print on white, non-folded, non-wrinkled standard quality paper. If you have many such documents your batch size can be quite large. But the more attributes a document has (e.g. colored paper, colored print, screen-printed images/graphics, bleed-through on thin paper, low contrast, yellowing, fading on sales slips, damaged paper, …) the more time consuming your scanning will get at a budget of 500 GBP. You will need to keep variation in document attributes as low as possible to reach your failure rate. As a consequence, your batch size will decrease. Depending on your documents, you might end up checking more or less every other document to stay within your failure rate. In case you want OCR for easier document retrieval and you have documents in different languages, this will add an additional dimension of complexity.

(C) Buy a professional software that claims to be capable of processing whatever you throw at your scanner – no need for document sorting beforehand. But 1. such software alone would blow your budget, 2. such software works only with certified scanners that eat up your entire budget and are still "hungry" for additional software.

0

You might have a chance to reach your failure rate of below perhaps 0.25% with less time and effort as in my answer above and within the budget of 500 GBP, which you mentioned in your parallel question, as follows:

There are companies that rent out professional scanners, sometimes including a computer with additional professional scan and/or post-processing software. Ask such a company for equipment (scanner and software) including introduction into its use and support on standby, available for a day or two, within your budget that allows maximum automation in image processing with a minimum of prior sorting into batches of similar document characteristics.

With some luck you might get equipment with your budget that will allow you to scan most of your documents in one run with some additional reruns for special cases - provided you are able to handle such equipment and/or have quick help on standby.

The benefit of this approach: You will see what is possible with scanners and software at a certain price level and you will be able to better adjust your expectations when you later buy your own document scanner at a budget you might revise upwards of 500 GBP after this experience.