Questions tagged [ocr]

Optical character recognition (OCR) is the process of converting images of text to text that can be manipulated by word processors etc.

Optical character recognition (OCR) is the process of converting images of written or printed text into a standard text format.

It is used when scanning paper documents or books to create a searchable text representation.

Similar technologies include

speech-recognition (speech to text)
handwriting-recognition (tablet input to text)

190 questions

votes

11 answers

How to extract text with OCR from a PDF on Linux?

How do I extract text from a PDF that wasn't built with an index? It's all text, but I can't search or select anything. I'm running Kubuntu, and Okular doesn't have this feature.

ubuntu pdf extract ocr

asked Aug 23 '09 at 22:34

agentofuser

7,677

votes

4 answers

How to create PDF with scanned pages but selectable text?

Today I recieved a PDF from our supplier and it contained several printed and scanned pages with signatures etc. I opened it in Acrobat Reader DC. But to my surprise the text from the evidently scanned images could be selected and copied as a text.…

pdf adobe-acrobat adobe-reader ocr

asked Feb 09 '18 at 09:16

Vojtěch Dohnal

3,938

votes

6 answers

Batch-OCR many PDFs

This has been discussed a year ago here: Batch OCR for many PDF files (not already OCRed)? Is there any way to batch OCR PDFs that haven't been already OCRed? This is, I think, the current state of things dealing with two issues: Batch OCR…

pdf adobe-acrobat ocr

asked May 14 '12 at 19:46

Joe

votes

11 answers

How to remove OCR from a PDF?

I have been searching Google for some time but cannot find an answer to my question. I have unwanted layers of OCR in a document that I recently scanned with Adobe Acrobat. It has not been OCRed properly, and I want to redact some information, but…

pdf adobe-acrobat ocr tiff

asked Oct 11 '14 at 06:32

Sanoo

votes

3 answers

Blurry text in PDF

I have a pdf that has blurry text. The text itself is readable but causes lots of strain. This is an example of the text. Is there a way to clear it up?

pdf ocr

asked Dec 28 '20 at 07:01

user1255895

votes

8 answers

How can I convert scanned images as PDF to a searchable PDF file?

I have a PDF of a scanned book. I'm looking for a free software that will perform OCR and then provide an option to save it as a PDF or document again. Is there one?

software-rec pdf ocr

asked Oct 04 '09 at 04:36

yuval

votes

3 answers

How can I identify fonts from an image?

Many times I come across bitmaps with nothing but text paragraphs, so I was looking for a way to identify the font used, the paragraph alignment, line spacing and color, bold, italics. Would an OCR package allow me to do that? If not, what other…

ocr vector-graphics bitmaps

asked Aug 03 '09 at 19:45

Robin Rodricks

2,532

votes

8 answers

Enable OCR in Greenshot

I run Windows 10 with Microsoft Office professional Plus 2016 on my computer. Looks like MS OCR functionality is enabled in my system since OneNote is able to copy text from image. But how to enable this functionality for Greenshot? Currently I…

windows-10 scanning ocr greenshot

asked May 29 '16 at 08:32

vico

2,811

votes

1 answer

How to analyze the space usage within pdf document?

I have this 7mb pdf that I made from 65 scanned B/W images. After OCR, the document becomes 32mb. I have never seen text taking up so much space. (in theory 25mb should give me 25 million letters uncompressed) Saving in plain text I have about…

pdf adobe-acrobat ocr

asked Dec 09 '13 at 17:19

ufotds

votes

8 answers

Practical OCR solution for converting a large book to a digital format?

I was over by my grandparent's place this past weekend. My grandmother pulled out this giant (~1400 page) book of her family history going back to 1630 or so. Giant nerd that I am, I thought it would be slick to have all the information stored in a…

ocr

asked Sep 15 '09 at 13:08

user11219

votes

6 answers

Extract OCR text from Evernote

Evernote does OCR on the images you save to it. Is there a way to get the full text equivalent for an image in Evernote, or is the OCR only for searching?

ocr evernote

asked Jun 09 '10 at 17:28

Leigh Riffel

1,896

votes

5 answers

PDF has an extra blank in all words after running through Ghostscript

This PDF was produced by Abbyy Finereader 10: http://ebooks.zeitr.org/from_abbyy.pdf You can copy & paste the first sentence and get this (very good) text result: Der »Bund Deutscher Gymnastik-Schulleiter« wurde am 20. November 1955 anläßlich einer…

pdf ocr ghostscript

asked May 16 '11 at 13:35

Erwin Jurschitza

votes

3 answers

Good free OCR with GUI for correcting mistakes? (for Windows)

I've used SimpleOCR, which has a nice GUI for correcting mistakes. Unfortunately it makes a lot of mistakes! (and suffers other bugs and limitations) On the other hand Tesseract is more accurate but has no GUI at all. My question is, is there a free…

gui ocr

asked May 15 '10 at 23:34

Hugh Allen

10,120

votes

4 answers

Batch OCR for many PDF files (not already OCRed)?

I use Google Desktop Search (I am on Vista) and not all my PDF files are recognized in my archive folder. It is normal as "PDF files that contain scanned images" are not indexed ( http://desktop.google.com/support/bin/answer.py?hl=en&answer=90651…

pdf ocr desktop-search

asked Feb 11 '10 at 19:30

Erb

votes

3 answers

Can Acrobat 11 be made to do OCR using multiple CPU cores?

OCR processing takes time. Using multiple CPU cores would speed up processing. Acrobat 10 was not a multithreaded application. How about Acrobat 11? Does 11 by default do OCR using multiple CPU cores (if available)? If not, are there any…

adobe-acrobat ocr multi-threaded cpu-cores

asked Oct 26 '12 at 23:38

tarcman.

2 3

…

12 13 Next