9

OCR processing takes time. Using multiple CPU cores would speed up processing. Acrobat 10 was not a multithreaded application. How about Acrobat 11? Does 11 by default do OCR using multiple CPU cores (if available)? If not, are there any workarounds, e.g. scripting, to help make Acrobat 11 do OCR using multiple CPU cores? Either through Acrobat's built in scripting language or using external scripts that launch and direct multiple single thread instances of Acrobat to in parallell to parts of the processing job.

Note: This question is not too localized (not limited to a specific moment in time) because (1) Adobe does not release new major Acrobat versions very often (Acrobat 10 was released two years ago) and (2) Adobe Acrobat is a widely used application.

slhck
  • 235,242
tarcman.
  • 151

3 Answers3

6

I have installed the Acrobat 11 (XI) trial in VirtualBox. Acrobat 11 is single threaded.

I have also made an external script that starts multiple Acrobat instances (one per CPU core), parallel processes the OCR job and merges the result. A crucial step is to turn on error logging in Acrobat preferences, parse all .log and reprocess any error files. The script (when using 4 cores) still does OCR over two times faster than Acrobat 11 default.

slhck
  • 235,242
tarcman.
  • 151
1

To use all cores for OCR you may want to look at PDF-Xchange Editor. It's OCR engine appears to use all cores on my system. Once you get to this level of performance though, it make sense to use an SSD.

There must be a Windows tweak that will cause it to dedicate more CPU time to a single threaded application that is not I/O bound. On my system, Acrobat is not being slowed by disk performance but the most CPU time I get building an index is about 30%.

Let's face it: Acrobat is a widely used but poorly written application. But, Acrobat Pro has some features you still can't get anywhere else (yet).

Len
  • 11
1

Multithreading needs to built into an application. The developer has to write code that creates threads and that breaks down the task into subtasks that can be allocated to each thread. If the developers of Acrobat fail to do this for their OCR recognition code, there's no way for the user to create the extra logic needed.