How to extract text from an image-based pdf using Cuneiform in terminal

Question

cuneiform -l eng -f text -o outocr.txt input.pdf

The above command, when run in terminal, outputs only the text of my PDF title page to the outocr.txt file. What should I do to make it recognize all the text in the 120-pages PDF? I am using Fedora Linux 25 (x86_64).

score 2 · Answer 1 · answered Aug 15 '17 at 15:37

2

This post provides an example script to read through many individual pages and create a multi-page PDF. Cuneiform by itself does not create multi-page documents. How to extract text with OCR from a PDF on Linux?

answered Aug 15 '17 at 15:37

gantner

21

How to extract text from an image-based pdf using Cuneiform in terminal

1 Answers1