92

I would like to extract page ranges from a PDF document into a new PDF document using the command line in Linux. Note that:

$ pdftk input.pdf cat 1 verbose output output.pdf
Error: Failed to open PDF file: 
   input.pdf
Errors encountered.  No output created.
Done.  Input errors, so no output created.

From here:

You (should) know that Pdftk is nothing more than a very old version of iText (a Java-PDF library) compiled with GCJ and extended with some command line functionality.

The keywords in the above statement are "VERY OLD".

$ java -classpath /path/to/Multivalent20091027.jar tool.pdf.Split -page 1 input.pdf
Exception in thread "main" java.lang.NoClassDefFoundError: tool/pdf/Split
Caused by: java.lang.ClassNotFoundException: tool.pdf.Split
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: tool.pdf.Split.  Program will exit.

Turns out, this is a bit of a tricky software: even if it's on SourceForge, and says here that

Practical Thought generously provides these tools for free use on the command line

However, here it says:

The browser is open source. The document tools are a free bonus and not open source.

Which finally clarifies the comment from conversion - Gluing (Imposition) PDF documents - Stack Overflow:

All releases of Multivalent linked from the official sourceforge site are missing the tools package.

(edit: there seems to be an old Multivalent version with the tools included, see the SO link; but as it looks somewhat like abandonware, I'd rather not use it)

Finally, I'd like to avoid tools that are essentially front ends for LaTeX like pdfjam.

Are there any options for such a PDF splitting command line tool under Linux?

sdaau
  • 6,008

6 Answers6

124

I find pdfseparate very convenient to split ranges into individual pages. You can extract all pages into files named output-page1.pdf, output-page2.pdf, like this,

pdfseparate input.pdf output-page%d.pdf

And you can extract pages 1 - 5 of input.pdf by using the first-page and last-page flags, -f 1 -l 5,

pdfseparate -f 1 -l 5 input.pdf output-page%d.pdf

If you want to recombine them into page ranges, for example pages 1-3 in one document and pages 4-5 in another, you can use the companion program, pdfunite, as follows:

pdfunite output-page1.pdf output-page2.pdf output-page3.pdf final-pages1-3.pdf
pdfunite output-page4.pdf output-page5.pdf final-pages4-5.pdf

I believe theese tools are part of poppler and may already be installed on your system.

Evan Carroll
  • 9,518
JoshOrndorff
  • 1,341
36

Using pdftk 2.02 worked for me on debian, but I think it should work for you too.

pdftk input.pdf cat 2-4 output out1.pdf

For a general case where you have to split a single pdf to multiple files I could not find a way with pdftk, so I'm using a Bash script.

16

You can use the pdfjam tool with the syntax

pdfjam <input-file> <page-ranges> -o <output-file>

and an example of page ranges would be

3,67-70,80

Source: https://tex.stackexchange.com/questions/79623 by Vincent Nivoliers

Flow
  • 1,556
5

I'll put this as an answer, so as not to clog the question: here is a related link on unix.se:

... and the accepted answer uses a Python script with PyPDF (but that answer implements a split of one page into two - and that script thus needs to be modified for page ranges, for it to work as asked in OP).

 

EDIT: I just found this: Stapler - A python utility for manipulating PDF docs based on pypdf (Page 3) / Community Contributions / Arch Linux Forums; which is, apparently "A small utility making use of the pypdf library to provide a (somewhat) lighter alternative to pdftk" (note that the mailing list notes some problems with it, however)...

sdaau
  • 6,008
0

You can use pdfly:

  • Installation: pip install pdfly and more
  • Usage: pdfly cat in.pdf 2:4 -o out.pdf and more

Some details:

 Page ranges refer to the previously-named file. A file not followed by a page
 range means all the pages of the file.
 PAGE RANGES are like Python slices.
 Remember, page indices start with zero.
 Page range expression examples:
:     all pages.
-1    last page.
22    just the  23rd page.
:-1   all but the last page.
0:3   the first   three pages.
-2    second-to-last page.
:3    the first      three pages.
-2:   last two pages.
5:    from the sixth page onward.
-3:-1 third &amp; second to last.

The third, "stride" or "step" number is also recognized.

::2       0 2 4 ... to the end.
3:0:-1    3 2 1 but not 0.
1:10:2    1 3 5 7 9
2::-1     2 1 0.
::-1      all  pages in reverse order.

Martin Thoma
  • 3,604
  • 10
  • 36
  • 65
0

Use PDFUNITE with brace expansion, or as multiple files with brace expansion:

First, Extract each page with pdfseparate:

$ pdfseparate InputFilename.pdf SeparateFilenames-%d.pdf

This is going to create and sequentially number your individual PDF page files.

    -------------

Then unite/merge using pdfunite:

$ pdfunite Separate_Filenames-{6..332}.pdf Combined.pdf

    -------------

The above PDFUNITE will combine files 6 thru 332 so you don't have to list the files separately. (332 is an arbitrary number; so is 6)

Pdfunite accepts/supports multiple filename arguments. So, if you want to eliminate a couple of pages from the middle of say, 10 pages/files, you could use a range of {1..4} {6..10} like so:

$ pdfunite file{1..4}.pdf file{6..10}.pdf Combined.pdf

In this manner, you can unite any combination of files that you want.