10

I have a PDF file containing maps of the building I work in, here:

http://www.libsys.und.edu/dev/FloorPlans_All.pdf

The original source files have been lost, and I've been asked to extract the map images, preferably without the text and icons that have been overlaid on top of them. This has proven annoyingly difficult.

So far, I have tried the following GUI programs:

  • Adobe Reader: lets me select text, but not the background images
  • FoxIt PDF Viewer: lets me select text, but not the background images
  • XPDF on Ubuntu 10.10: lets mes select text, but not the background images

And also the following command-line programs:

  • pdfimages: extracts the icons indicating bathrooms just fine, but not the background images
  • pdftohtml: same as pdfimages, plus it makes a poorly marked up HTML document
  • pdfextract: same as pdfimages
  • convert: successfully saved images, but with the text burned into them

I've even tried opening the PDF manually in a text editor and extracting the stream objects by pasting them into a new file and saving it with a .jpg, .png, or .bmp extension (each in turn). Considering how little I know about the internal structure of PDF files, it's no surprise that this didn't work.

So ... is there any way I can retrieve the map images from this thing without also getting the text and icons?

6 Answers6

7

You can download the XPDF library from http://www.foolabs.com/xpdf/download.html for Linux and Windows. Then run pdfimages -j input.pdf output and you should get output-000.jpg, output-001.jpg, etc. Also, check out http://linuxcommand.org/man_pages/pdfimages1.html for more usage options.

mybluevan
  • 116
3

Ok, after messing around with this for 5 minutes, my analysis is that PDF is even weirder than I originally thought, and that's saying something.

Not sure what your budget is, but with Acrobat Pro Extended 9, you can use:

A. Tools, Advanced Editing, Touchup Text Tool

-Select All
-Right click, Properties
-Text tab
-Select a standard font (e.g. Arial), close
-Hit Delete

B. Tools, Advanced editing, Touchup Object Tool

-Select the object (you can get most, but not all, of them (e.g. student computers icons can't be selected), then delete

Here's what Page 1 looked like after a quick cleanup: http://dl.dropbox.com/u/7434256/p1test.pdf

Craig H
  • 1,252
  • 12
  • 13
1

Take the PDF which was made by Craig H and optimize it a bit by running it through Ghostscript. On Windows the commandline is:

gswin32c.exe ^
   -o p1test-gs-optimized.pdf ^
   -sDEVICE=pdfwrite ^
   -dPDFSETTIINGS=/prepress ^
    p1test.pdf

On Linux/Unix/Mac OS X do:

gs \
   -o p1test-gs-optimized.pdf \
   -sDEVICE=pdfwrite \
   -dPDFSETTIINGS=/prepress \
    p1test.pdf

This will bring down the size of the file from 3.000 kByte to about 60 kByte without loosing content. Then importing it to Inkscape (or InDesign, Illustrator,...) should be much faster....

Kurt Pfeifle
  • 13,079
1

...you could try Photoshop. It reads PDF's, and it's 'possible' it originated in PS and possibly still has the layers... but it's a very long-shot.

aart12
  • 11
  • 1
0

In a Linux environment I have used pdfmod to extract all the images in one go. See https://wiki.gnome.org/Apps/PdfMod or, for Ubuntu users, https://apps.ubuntu.com/cat/applications/pdfmod/

To download and install it in Ubuntu, it is sufficient to type sudo apt-get install pdfmod.

  • Start the pdfmod GUI (type in pdfmod in the dashboard or command-line terminal)
  • Open the PDF document
  • Select all the pages (or any that you want to extract the images from)
  • The Edit menu item will present the option of extracting as many images as they can be extracted within the selected range (export n images, with n the appropriate number). You can also access this command by hovering with your mouse on the selection and activating the local menu (right-click for the right-handed).
  • Once you go ahead with this, a new window will open up where you select the location to save the images into.

Hope this helps.

0

Open the document on your screen, zoom in on the picture to make it as large as possible but all of it is still visible. Press alt+prnt scrn (or the equivalent on your operating system) and it should take a screen shot of the program. Now open up paint or your favorite image editor (photoshop, gimp, etc) paste in the picture and crop out anything you don't want.

Will Gunn
  • 410