40

I have a PDF that contains a scan image of a document. I want to save the contents of this PDF as an image so that I can then run it through an OCR program that only accepts .jpg, .png, and .gif type files.

How do I save/convert this PDF to one of those image formats?

EDIT: One way I've found to do this is to click on each page. Copy to clipboard. Paste to Paint.net and then save. However, this is cumbersome as it appears you can only select one page at a time in Acrobat Reader.

Guy
  • 4,257

13 Answers13

22

Please pay close attention to pooryorick's answer, in which he points out how sleske's answer is actually a much better answer for this particular problem.


Use GhostScript. This command works for me:

gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dGraphicsAlphaBits=4 -dTextAlphaBits=4 -r150 -sOutputFile=output%d.png input.pdf

There are multiple png pseudo-devices, differentiating on color depth: pngmono, pnggray, png16, png256, png16m, and pngalpha. Choose whichever one suits you the best.

You can also use jpeg, but unless you have a disk space issue, you want as high a quality as you can manage for your OCR, and that's not jpeg.

GhostScript no longer has support for gif, but I can't imagine why you'd need that, what with png256 support.

wfaulk
  • 6,307
20

Install Imagemagick. Open a cmd window or terminal:

convert myfile.pdf myfile.jpg

The output will be 1 jpg file for each page in your pdf, test-0.jpg, test-1.jpg, etc.

DaveParillo
  • 14,761
16

pdfimages can extract embedded images from a PDF. It will not convert a whole PDF page to an image. It's included in Xpdf tools or Poppler utils.

This is useful if the PDF contains text and images, and you want only the images. Also, it will extract the images in their original format, so no loss of quality is involved (unlike programs which render the whole page and then convert it to e.g. JPEG).


List all images from mydocument.pdf:

pdfimages -list mydocument.pdf

Extract all images from PDF mydocument.pdf to individual files named mydocument-image-0000.jpg, mydocument-images-0001.jpg and so on:

pdfimages -j mydocument.pdf mydocument-image

Option -j makes it write embedded JPEG-compressed images as JPEG files, not as PBM/PGM/PPM files (which are uncompressed and huge). Note that images may still be written as PBM/PGM/PPM files, if that's how they were stored in the PDF input file.

If you're using Poppler I recommend replacing it with -all to write JPEG, JPEG2000, JBIG2, and CCITT images in their native format. CMYK files are written as TIFF files. All other images are written as PNG files.

sleske
  • 23,525
11

Except for the answer mentioning pdfimages, all of the other answers fail to mention that their solutions actually transcode the embedded images. I.e., those solutions do not simply extract the original image, but modify it, possibly to the detriment of the image, during the process. Only pdfimages extracts the original image. This is true of Ghostscript, Imagemagick, Adobe Reader, PDFFill, PDF Xchange Viewer, OS X Preview, and most other PDF software.

10

You can do this using adobe reader:

  1. Click the image. It will be highlighted.
  2. Copy (Ctrl-C) and paste it into Paint.
  3. Save as any file type you like.
Hemant
  • 1,548
5

PDFill PDF Tools is probably the easist way to convert your PDFs to images on Windows. It'll let you export all the pages in the PDF to separate images in one shot. It also has a lot of other features available for free, which are only available in other PDF viewers if you purchase the commercial or "Pro" version.

Use the "Convert PDF to Images" button (button #10) in the screenshot below.

PDFill PDF Tools screenshot

If you need to concatenate the images into one very tall image so you only have to feed one file to your OCR program, you can use IrfanView

Gareth
  • 19,080
rob
  • 14,388
2

(Non-free) Acrobat professional does this:

Advanced->Document Processing->Export all images...

ufotds
  • 721
1

Since you didn't include an OS tag I'll include an OSX answer:

PDFs by default open in Preview.app which allows you to use File -> Save-As:

  • GIF
  • ICNS
  • JPEG
  • JPEG-2000
  • BMP
  • OpenEXR
  • Photoshop
  • PNG
  • TGA
  • TIFF
Lake
  • 469
0

This post focuses on: Getting Original Image Resolution & use Pdf editor - Foxit

Solution

get Original Image Resolution

When you use a Pdf Editor - Foxit.

  • If you want to know the original image resolution,
    -> open pdf editor > go to preflight > single check / fixups > find image with ppi > analyze > result > image info

    enter image description here

Method 1: export images

  • -> export all images > choose Resolution (you can manually enter the number eg: 200)

    enter image description here

    • export all images - export only the images (pick this)
      export to image - export the whole pdf page as an image (the instruction pic shown above is wrong)

    • if you choose higher ppi than the original one, it will give you a higher ppi, but it will try to use algorithm to "smooth" the pixel edges -- which is not the original resolution

    • (If you use ghostscript instead, specify that as -r200.)

Method 2: edit and save to image (recommonded)

  • -> Edit panel > edit object > right click > edit object > Image panel > copy (/ save) as bmp

    enter image description here enter image description here

    • (this is the original resolution, I tested & viewed, plus this is bmp)

    • this is better, cuz in the case where the ppi is shown in decimals, the way above may not export the image in the original resolution

    • use "copy" instead of "save as bmp" > then paste into image editor like GIMP / ShareX / Windows Paint > save

      • with ShareX / InsideClipboard - Nirsoft / Ditto / Copyq, you can inspect inside the clipboard -- that shows its in bmp format (seems, and should mean original file with no resolution lost)
    • if you use "save as bmp", the result bmp file can be green, idk why

  • Note:

    • export all images > Auto detect image resolution -- is just not going to pick the original resolution, idk why.
    • right click > save image -- is same, not good.
  • Note:

    • Though the preflight shows that is a jpg file, logically it should be easily & naturally extracted as jpg.
      But Idk why GPT says its not possible due to "encoding and compression methods".

Misc

  • this is not true:

    When you use use ghostscript.
    If you want to get the region of the image only .
    -> use -dUseArtBox instead of -dUseCropBox.

    (even when part of it sits outside of the page)

Reference

Types of Boxes (ArtBox, CropBox, etc) in a PDF
(check GPT yourself, I may not be able to post here.)

preflight
Solved: How to verify the resolution of a PDF - Adobe Community - 9226360
https://community.adobe.com/t5/acrobat-discussions/how-to-verify-the-resolution-of-a-pdf/td-p/9226360

Nor.Z
  • 143
0

Also PDF Xchange Viewer (Free) will do export-to-file. File → Export → Export to image.

Not only that, but I think it's the best free PDF viewer for Windows, and it has some nice markup capabilities. I have a license for Adobe Acrobat and I still prefer this unless I'm doing extensive editing, which is rarely.

wfaulk
  • 6,307
-1

If the file is less than 5MB and you aren't worried about privacy/confidentiality, then is a handy online service at http://www.go2convert.com/ that can do a lot of graphic conversions (including pdf to jpeg)

sgmoore
  • 6,599
-2

If the image exceeds the size of you screen, you may use FastStone Capture (the "Capture Scrolling Window" feature) and save the image as a JPEG.

alt text

Gareth
  • 19,080
-2

You can check out this article.

It lists out 6 different ways to convert the pdf into images.

Convert PDF to JPG (The Web Way)

PDF to JPG Converters for The Desktop

noob
  • 1,395