8

I'm using Linux and I would like to have software (or script, method) which merges some pdfs and creates an united output pdf, containing bookmarks. Bookmarks are named by filename of pdf files, which were used for merging and pointing to the page number, where these files begin.

Similar possibilities have Adobe Acrobat, but it is non-free and Windows-only.

yanpas
  • 562

3 Answers3

11

UPDATE: I wasn't satisfied with the result and have written this with nice GUI:

https://github.com/Yanpas/PdfMerger


Learned python and has written (modified) program in one hour:

#! /usr/bin/env python
# Original author Nicholas Kim, modified by Yan Pashkovsky
# New license - GPL v3
import sys
import time
from PyPDF2 import utils, PdfFileReader, PdfFileWriter

def get_cmdline_arguments(): """Retrieve command line arguments."""

from optparse import OptionParser

usage_string = "%prog [-o output_name] file1, file2 [, ...]"

parser = OptionParser(usage_string)
parser.add_option(
    "-o", "--output",
    dest="output_filename",
    default=time.strftime("output_%Y%m%d_%H%M%S"),
    help="specify output filename (exclude .pdf extension); default is current date/time stamp"
)

options, args = parser.parse_args()
if len(args) < 2:
    parser.print_help()
    sys.exit(1)
return options, args

def main(): options, filenames = get_cmdline_arguments() output_pdf_name = options.output_filename + ".pdf" files_to_merge = []

# get PDF files
for f in filenames:
    try:
        next_pdf_file = PdfFileReader(open(f, "rb"))
    except(utils.PdfReadError):
        print >>sys.stderr, "%s is not a valid PDF file." % f
        sys.exit(1)
    except(IOError):
        print >>sys.stderr, "%s could not be found." % f
        sys.exit(1)
    else:
        files_to_merge.append(next_pdf_file)

# merge page by page
output_pdf_stream = PdfFileWriter()
j=0
k=0
for f in files_to_merge:
    for i in range(f.numPages):
        output_pdf_stream.addPage(f.getPage(i))
        if i==0:
            output_pdf_stream.addBookmark(str(filenames[k]),j)
        j = j + 1
    k += 1

# create output pdf file
try:
    output_pdf_file = open(output_pdf_name, "wb")
    output_pdf_stream.write(output_pdf_file)
finally:
    output_pdf_file.close()

print "%s successfully created." % output_pdf_name


if name == "main": main()

This program requires PyPDF2, you can install it via sudo pip install pypdf2, before this you need to install pip :) Just open terminal and enter ./pdfmerger.py *.pdf

Mateen Ulhaq
  • 3,728
yanpas
  • 562
6

This Bash script will make each PDF in a directory contain one bookmark to its first page with the text of the PDF's filename, and then it will concatenate them all. It can handle Non-ASCII filename.

#!/usr/bin/bash

cattedPDFname="${1:?Concatenated PDF filename}"

make each PDF contain a single bookmark to first page

tempPDF=mktemp for i in *.pdf do bookmarkTitle=basename &quot;$i&quot; .pdf bookmarkInfo="BookmarkBegin\nBookmarkTitle: $bookmarkTitle\nBookmarkLevel: 1\nBookmarkPageNumber: 1" pdftk "$i" update_info_utf8 <(echo -en $bookmarkInfo) output $tempPDF verbose mv $tempPDF "$i" done

concatenate the PDFs

pdftk *.pdf cat output "$cattedPDFname" verbose

Yuji
  • 3
Geremia
  • 573
2

Modifying a good answer [1] of tex.stackexchange.com, you can create an itemize list with the reference to the files that you will include below. (Similarly to a toc). Latex will take care to update the page numbers.

Some Latex words more

  • A line as this will include the PDF file MyDoc1.pdf with the reference name "doc01" present in the same directory of the latex file:

    \modifiedincludepdf{-}{doc01}{MyDoc1.pdf}
    
  • A command as \pageref{doc02.3} will create a link with the number of the third page of the document that has for reference the key "doc02". Latex will take care to keep it updated.

  • A block \begin{itemize} \end{itemize} will create a pointed list.

The latex file
Here below the modified template that will work with pdflatex:

\documentclass{article}
\usepackage{hyperref}
\usepackage{pdfpages}
\usepackage[russian,english]{babel}

\newcounter{includepdfpage}
\newcounter{currentpagecounter}
\newcommand{\addlabelstoallincludedpages}[1]{
   \refstepcounter{includepdfpage}
   \stepcounter{currentpagecounter}
   \label{#1.\thecurrentpagecounter}}
\newcommand{\modifiedincludepdf}[3]{
    \setcounter{currentpagecounter}{0}
    \includepdf[pages=#1,pagecommand=\addlabelstoallincludedpages{#2}]{#3}}

\begin{document}

You can refer to the beginning or to a specific page: \\
see page \pageref{doc01.1} till \pageref{doc02.3}.\\

\begin{itemize}
  \item Here contribution from Grupmate 1 \pageref{doc01.1}
  \item Here contribution from Grupmate 2 \pageref{doc02.1}
\end{itemize}

\modifiedincludepdf{-}{doc01}{MyDoc1.pdf}
\modifiedincludepdf{-}{doc02}{MyDoc2.pdf}

\end{document}

Note

To simply merge and split PDF documents or pages you can use tools as pdftk and take inspiration from other questions [3] about it.

References

Hastur
  • 19,483
  • 9
  • 55
  • 99