Questions tagged [ocr]

Optical character recognition (OCR) is the process of converting images of text to text that can be manipulated by word processors etc.

Optical character recognition (OCR) is the process of converting images of written or printed text into a standard text format.

It is used when scanning paper documents or books to create a searchable text representation.

Similar technologies include

190 questions
50
votes
11 answers

How to extract text with OCR from a PDF on Linux?

How do I extract text from a PDF that wasn't built with an index? It's all text, but I can't search or select anything. I'm running Kubuntu, and Okular doesn't have this feature.
agentofuser
  • 7,677
37
votes
4 answers

How to create PDF with scanned pages but selectable text?

Today I recieved a PDF from our supplier and it contained several printed and scanned pages with signatures etc. I opened it in Acrobat Reader DC. But to my surprise the text from the evidently scanned images could be selected and copied as a text.…
24
votes
6 answers

Batch-OCR many PDFs

This has been discussed a year ago here: Batch OCR for many PDF files (not already OCRed)? Is there any way to batch OCR PDFs that haven't been already OCRed? This is, I think, the current state of things dealing with two issues: Batch OCR…
Joe
  • 452
  • 1
  • 5
  • 11
23
votes
11 answers

How to remove OCR from a PDF?

I have been searching Google for some time but cannot find an answer to my question. I have unwanted layers of OCR in a document that I recently scanned with Adobe Acrobat. It has not been OCRed properly, and I want to redact some information, but…
Sanoo
  • 565
23
votes
3 answers

Blurry text in PDF

I have a pdf that has blurry text. The text itself is readable but causes lots of strain. This is an example of the text. Is there a way to clear it up?
user1255895
  • 241
  • 1
  • 2
  • 4
19
votes
8 answers

How can I convert scanned images as PDF to a searchable PDF file?

I have a PDF of a scanned book. I'm looking for a free software that will perform OCR and then provide an option to save it as a PDF or document again. Is there one?
yuval
16
votes
3 answers

How can I identify fonts from an image?

Many times I come across bitmaps with nothing but text paragraphs, so I was looking for a way to identify the font used, the paragraph alignment, line spacing and color, bold, italics. Would an OCR package allow me to do that? If not, what other…
15
votes
8 answers

Enable OCR in Greenshot

I run Windows 10 with Microsoft Office professional Plus 2016 on my computer. Looks like MS OCR functionality is enabled in my system since OneNote is able to copy text from image. But how to enable this functionality for Greenshot? Currently I…
vico
  • 2,811
14
votes
1 answer

How to analyze the space usage within pdf document?

I have this 7mb pdf that I made from 65 scanned B/W images. After OCR, the document becomes 32mb. I have never seen text taking up so much space. (in theory 25mb should give me 25 million letters uncompressed) Saving in plain text I have about…
ufotds
  • 721
13
votes
8 answers

Practical OCR solution for converting a large book to a digital format?

I was over by my grandparent's place this past weekend. My grandmother pulled out this giant (~1400 page) book of her family history going back to 1630 or so. Giant nerd that I am, I thought it would be slick to have all the information stored in a…
user11219
13
votes
6 answers

Extract OCR text from Evernote

Evernote does OCR on the images you save to it. Is there a way to get the full text equivalent for an image in Evernote, or is the OCR only for searching?
Leigh Riffel
  • 1,896
11
votes
5 answers

PDF has an extra blank in all words after running through Ghostscript

This PDF was produced by Abbyy Finereader 10: http://ebooks.zeitr.org/from_abbyy.pdf You can copy & paste the first sentence and get this (very good) text result: Der »Bund Deutscher Gymnastik-Schulleiter« wurde am 20. November 1955 anläßlich einer…
Erwin Jurschitza
10
votes
3 answers

Good free OCR with GUI for correcting mistakes? (for Windows)

I've used SimpleOCR, which has a nice GUI for correcting mistakes. Unfortunately it makes a lot of mistakes! (and suffers other bugs and limitations) On the other hand Tesseract is more accurate but has no GUI at all. My question is, is there a free…
Hugh Allen
  • 10,120
10
votes
4 answers

Batch OCR for many PDF files (not already OCRed)?

I use Google Desktop Search (I am on Vista) and not all my PDF files are recognized in my archive folder. It is normal as "PDF files that contain scanned images" are not indexed ( http://desktop.google.com/support/bin/answer.py?hl=en&answer=90651…
Erb
  • 415
9
votes
3 answers

Can Acrobat 11 be made to do OCR using multiple CPU cores?

OCR processing takes time. Using multiple CPU cores would speed up processing. Acrobat 10 was not a multithreaded application. How about Acrobat 11? Does 11 by default do OCR using multiple CPU cores (if available)? If not, are there any…
tarcman.
  • 151
1
2 3
12 13