63

I have a pdf document I created through non-Acrobat means (printing to pdf, then merging a bunch of pdfs), but I'd like to manually change the page numbers (i.e. the first several pages are simply title pages, the page that is labeled "page 1" is really the 7th sheet of the pdf). What's the simplest (and ideally, free) way to do this?

To be clear, I am not trying to change the numbers on the pages themselves, but the page numbers in the "metadata" that the pdf stores (the pages themselves are already numbered correctly; I just want "go to page 1" to go to the page labeled 1, which could be sheet 7).

For what it's worth, I'm on Windows, though I have access to Macs as well.

Arjan
  • 31,511
YGA
  • 2,035

12 Answers12

60

What you want is indeed called page labels and can easily be added directly in the PDF's source code. Rename the file extension from pdf to txt and open the file in a text editor (this can be slow, depending on the file size, be patient). The information about page labels is stored in a node called the document catalog which looks something like this:

3 0 obj
<< /Type /Catalog
   /Pages 1 0 R
>>
endobj

It may contain more confusing stuff, but this is the basic structure. There is only one catalog, so in a large file you can search for the node that contains /Catalog. Now you can make your desired changes by inserting the /PageLabels entry:

3 0 obj
<< /Type /Catalog
   /Pages 1 0 R
   /PageLabels << /Nums [ 0 << /P (cover) >>
                          % labels 1st page with the string "cover"
                          1 << /S /r >>
                          % numbers pages 2-6 in small roman numerals
                          6 << /S /D >>
                          % numbers pages 7-x in decimal arabic numerals
                        ]
               >>
>>
endobj

There are 3 lines starting with numbers, called page indices. Page 1 has the index 0, page 2 the index 1 and so forth. They always describe ranges, so the line with 1 <<...>> applies to all pages from index 1 to 5 and the line with 6 <<...>> applies to all pages from 6 up to the last page. A label for 0 <<...>> must always be defined.

You can find more information about page labels and PDF source code in section 12.4.2 of the PDF 1.7 standard.

23

NOTE 1: The accepted answer is still mostly correct, but has some gaps. It is lacking in that many PDF files are not directly editable as text. Even when they are, such editing can sometimes damage the PDF making it unreadable. One solution, that will work for both Unix and Microsoft Windows is qpdf which can translate PDF files into "QDF", a text-editable form which is still a valid PDF file. The qpdf package comes with fix-qdf that recalculates offsets after a QDF file has been edited to correct any damage.

NOTE 2: Uncomfortable with text editors? Try using a GUI editor such as jpdftweak first. Sometimes the GUI pdf editors work, in which case, yay, you're done. However, when they fail, as has often been the case for me, you can try this more robust alternative. Either way, please do not down vote my answer for being less than elegant.


HOW TO Edit PDF Page Numbers Using Qpdf

Summary:

  1. qpdf -qdf foo.pdf foo.qdf
  2. edit foo.qdf

     0 << >>           % No label on first pages
     6 << /S /D >>     % Start numbering from 7th page.
    
  3. fix-qdf foo.qdf >bar.qdf
  4. test bar.qdf
  5. qpdf bar.qdf bar.pdf

Detailed steps

Step 1.

Convert the document to the easily editable QDF format. Run qpdf from the command line like so:

qpdf -qdf foo.pdf foo.qdf

Note: If you do not have qpdf installed already, Microsoft Windows executables can be downloaded from https://github.com/qpdf/qpdf/releases Unix systems, such as Ubuntu and Debian GNU/Linux can install it by typing apt install qpdf.

Step 2.

Edit the QDF document using a text editor such as notepad++, emacs, or gedit. Search for the word /Catalog and note the <<angle brackets>> it is inside. Nearby, you'll find the current /PageLabels (if any).

We'll be adding each section that should be differently numbered to the /PageLabels. The format is start-page << style >>. Note that white-space does not matter and that the first page of the document is 0. Unless otherwise specified, a new section always starts out numbering pages from 1.

Examples

Here is a full example of what PageLabels may look like, with comments added:

/Type /Catalog
/PageLabels <<
  /Nums [
    0           % From the first page of the document,
      <<
        /S /r   % ...use the lowercase roman numeral style.
      >>
    6           % From seventh page onward,
      <<
        /S /D   % ...use ordinary digits (arabic numerals)
      >>
  ]
>>

If the file has no PageLabels, add them after /Type /Catalog. For example, one might change,

1 0 obj
<<
  …
  /Type /Catalog
>>
endobj

into,

1 0 obj
<<
  … 
  /Type /Catalog
  /PageLabels
      << /Nums [
    0 << >>                 % No label for cover
    1 << /S /r >>           % i, ii for index
    3 << /S /D /St 15 >>    % 15, 16, 17, ... for article
    31 << /S /D /P (A-) >>  % A-1, A-2, A-3... for appendix
       ]
  >>
>>
endobj

OPTIONAL: STARTING FROM A DIFFERENT NUMBER WITH /St

Each section restarts numbering at 1 unless you tell it otherwise using /St. Notice how in the above example, the fourth page starts at 15.

OPTIONAL: USING A DIFFERENT STYLE WITH /S

The /S operator takes an argument that lets you pick the numbering style,

  • /D digits (1, 2, 3...)
  • /R uppercase Roman (I, II, III...)
  • /r lowercase Roman (i, ii, iii...)
  • /A uppercase alphabetical (A, B, C, ...., X, Y, Z, AA, AB, AC,...)
  • /a lowercase alphabetical (a, b, c, ...., x, y, z, aa, ab, ac,...)

If one omits the /S operator, then that section of pages will have no numbering. For example:

0 << >>         % No label for cover

OPTIONAL: ADDING A PREFIX TO EACH PAGE WITH /P

You can show any string of text before the page number by specifying a word in parentheses after /P:

  31
  <<
    /S /D
    /P (A-)     % label appendix pages A-1, A-2, A-3
  >>

Specifying a prefix without a style (/S), will give you pages that have only the word without any number. This can be useful, for example, if you'd like a cover page to simply have the label "Cover".

     0 << /P (Cover) >>        % No number, just "Cover"

Step 3.

Run fix-qdf to make your edits valid PDF and put the output in bar.qdf.

fix-qdf foo.qdf > bar.qdf

Step 4.

Open bar.qdf in your PDF viewing program and check that it is numbered correctly.

Step 5.

Convert the QDF file back into a normal PDF, like so:

qpdf bar.qdf bar.pdf

Ta da. You're done. You now have a document with correctly labeled page numbers in bar.pdf.

hackerb9
  • 1,117
8

There is a little python script, that can do the job: https://github.com/lovasoa/pagelabels-py

In your case call something like:

./addpagelabels.py --delete file.pdf
./addpagelabels.py --startpage 1 --type 'roman lowercase' file.pdf
./addpagelabels.py --startpage 7 --type arabic file.pdf
DG'
  • 859
8

The Java variant of pdftk has support for editing page labels starting from version 3.1.0.

To use it, first create a file with the labels, let's say it's called metadata.txt:

PageLabelBegin
PageLabelNewIndex: 1
PageLabelStart: 1
PageLabelPrefix: Cover
PageLabelNumStyle: NoNumber
PageLabelBegin
PageLabelNewIndex: 2
PageLabelStart: 1
PageLabelPrefix: Back Cover
PageLabelNumStyle: NoNumber
PageLabelBegin
PageLabelNewIndex: 3
PageLabelStart: 1
PageLabelNumStyle: LowercaseRomanNumerals
PageLabelBegin
PageLabelNewIndex: 27
PageLabelStart: 1
PageLabelNumStyle: DecimalArabicNumerals
  • PageLabelNewIndex is the page from which the numbering style applies, counting from one.
  • PageLabelStart is the starting number. For example, if you specify 5 here, the pages will be numbered 5, 6, 7, ...
  • PageLabelNumStyle can be DecimalArabicNumerals, UppercaseRomanNumerals, LowercaseRomanNumerals, UppercaseLetters, LowercaseLetters or NoNumber.

After you've finished editing, apply the metadata to your PDF file:

pdftk book.pdf update_info metadata.txt output book-with-metadata.pdf
Pkkm
  • 181
6

If I understand you correctly, here is how it should work:

gs \
  -o modified-pagelabels-50pages.pdf \
  -sDEVICE=pdfwrite \
  -c "[ /Page 1 /Label (i)     /PAGELABEL pdfmark" \
  -c "[ /Page 2 /Label (ii)    /PAGELABEL pdfmark" \
  -c "[ /Page 3 /Label (III)   /PAGELABEL pdfmark" \
  -c "[ /Page 4 /Label (four)  /PAGELABEL pdfmark" \
  -c "[ /Page 5 /Label (v)     /PAGELABEL pdfmark" \
  -c "[ /Page 6 /Label (|||||) /PAGELABEL pdfmark" \
  -f 50pages.pdf

However, I seem to remember, that this didn't reliably or fully work last time I tried this (about 2 years ago).

UPDATE: My memory wasn't failing me. I now tried again and filed a bug report for Ghostscript (bug 691889) concerning this. Follow the link to the bug report to see the details.

Kurt Pfeifle
  • 13,079
6

jPdf Tweak is an Open Source graphical utility that lets you edit page labels in PDF files. The documentation page provides step-by-step instructions.

1

This answer is a corollary to the text editor method I posted previously. It was requested on another forum that I add an example using the Python API to qpdf instead of its command line interface.

PikePDF: qpdf for Python

Changing page numbers programmatically

The following is a working program for a moderately complex document that changes numbering four times.

from pikepdf import open as Pdfopen, Name, Dictionary, NumberTree
pdf=Pdfopen("input.pdf")
try:
    pdf.Root.PageLabels
except:
    nt = NumberTree.new(pdf)
    pdf.Root.PageLabels = nt.obj

pagelabels = NumberTree(pdf.Root.PageLabels)

First page is a cover, so it should have no page number.

pagelabels[0] = Dictionary()

Second and third page use lowercase Roman numerals

pagelabels[1] = Dictionary(S=Name.r) # i, ii for index

Pages 4 thru 31 use digits starting at "15".

pagelabels[3] = Dictionary(S=Name.D, St=15) # 15, 16, 17, ... for article

Pages 32 onwards have a prefix of "A-" before digit.

pagelabels[31] = Dictionary(S=Name.D, P='A-') # A-1, A-2, A-3 for appendix

pdf.save('output.pdf')

Page labels will now be:

<blank>, i, ii, 15, 16, 17, ..., 41, 42, A-1, A-2, A-3

Usage notes

  • PageLabels are what most people call "page numbers".

  • Adding a PageLabel to a page changes the numbering scheme for all following pages until the next PageLabel.

  • The three Dictionary options for a PageLabel are:

    1. S: Numbering Style (pikepdf.Name, defaults to none)
    2. St: Starting number (integer, defaults to 1)
    3. P: Prefix before each number (string, defaults to none)
  • The possible values for S (the style) are:

    • Name.D: Digits (1, 2, 3, ..)
    • Name.R: Uppercase Roman (I, II, III...)
    • Name.r: lowercase roman (i, ii, iii...)
    • Name.A: Uppercase Alphabetical (A, B, C, ...., X, Y, Z, AA, AB, AC,...)
    • Name.a: lowercase alphabetical (a, b, c, ...., x, y, z, aa, ab, ac,...)
  • If S is omitted from the Dictionary(), pages will be labelled with only the prefix (P), if any. If P is also omitted, then PDF readers will show a blank for the "page number".

  • If for some reason you wanted to have a word as a page number, you can do that by setting a prefix (P) without a numbering style (S).

    pagelabels[34] = Dictionary(P='Hello')        # "Hello" is the page number
    
  • If no PageLabels exist, then the first page is numbered "1".

  • The first page of a PDF is indexed at [0], even though its page label is "1". This trips everyone up.

  • The official documentation is here:

    https://pikepdf.readthedocs.io/en/latest/api/models.html#pikepdf.NumberTree


    This method is not for everyone, but I post it in the hope that it will be useful to the "Super Users" who know Python.
    I welcome comments on how my answers can be improved.
hackerb9
  • 1,117
0

I found direct editing of the file (as uncompressed by pdftk) not to work if there are already '/titles' set in the '/outlines' region. The direct-editing technique described in a post above is demonstrated on Youtube: https://www.youtube.com/watch?v=zoH1Z_hSpak

But the 'update' feature of pdftk may be more intuitive (and more reliable when '/titles' already exist in the '/outlines' region of the PDF file) via editing the 'doc_data.txt' file used here: https://www.pdflabs.com/blog/export-and-import-pdf-bookmarks/

Bob
  • 9
0

I'm extending the excellent answer from @hackerb9 with an example that will reset the page numbering of the whole pdf file to 1.

This can be useful in case of weird or broken page numbering when combining multiple different pdfs.

To reset page numbering of entire pdf file, starting from 1 with page 1 do the following:

  1. qpdf -qdf foo.pdf foo.qdf
  2. open foo.qdf with a text editor and replace the first object with this
    %% Original object ID: 1 0
    1 0 obj
    <<
      /Type /Catalog
      /Pages 2 0 R
    >>
    endobj
    
  3. fix-qdf foo.qdf >ok.qdf
  4. test ok.qdf
  5. qpdf ok.qdf ok.pdf
Gruber
  • 501
0

Old question from 2011 and best answer valid from 2005 to 2018 has been deleted.

One of the best PDFMetaEdit application in its day was BeCyPDFMetaEdit which could alter "Page Labels" via Command Line Interface or GUI.

Its unique ability to Fix/Repair broken PDFs, Edit MetaData and roll back historic additions set it apart from many simpler applications.

Became abandonware in its later years but still works perfectly well (with limitations) in Windows Only.

The web archive is at https://web.archive.org/web/20180929111456/http://www.becyhome.de/becypdfmetaedit/description_eng.htm

Based on a powerful PDF assembler it was designed for PDFs upto but not including more recent 1.7 variants with XMP tracking data, so has some ability to remove XMP but not fully edit the XML.

To answer the OP question adding labels was simple in the GUI as 2 line entries. Here setting 1st 6 pages as I-VI and 7 as 1 onwards, by using "incremental mode" it does not alter source data, thus not corrupt any PDF such as newer versions with XMP etc.

enter image description here

Files with XMP are not the majority but when encountered, that extra Modern MetaData may need special handling by other means

enter image description here

1.3 Metadata (XMP)

Since PDF version 1.4, metadata can be stored in a new XML-based format named XMP ("Extensible Metadata Platform"). With regard to backward compatibility, newer PDF documents contain the metadata both in XMP and also in classical form. However, the application is currently not capable of processing metadata in XMP format. This can lead to the effect that a PDF viewer shows the original field values after the metadata has been edited. Even if the PDF viewer shows the new metadata, the XMP metadata still contains the original field values which could be extracted using a hex or text editor.

To address these problems, the application allows at least to remove the XMP metadata. The metadata will then only be stored in the classical format.

Attention: XMP-based metadata cannot only be specified for the entire PDF document but also for parts of it. The application only deletes the document-specific metadata, metadata attached to other document content stays as is. Therefore, additional tools like a hex editor are required if all XMP metadata shall be removed.

NOTE

Windows Security treats such a powerful app as undesirable since it can remove PDF protection etc. and thus it needs to be run in administrator mode with other compatibility switches, (Perfectly safe, but "Not for the faint hearted").

Later Edit actually it plays nicely in Windows 10/11 using a wrapper.bat file to bypass UAC, no need to find a registry key or other workaround, use something like this:-

cmd /min /C "set __COMPAT_LAYER=RUNASINVOKER && start "" %~pd0\BeCyPDFMetaEdit.com %*"

Once done it is easy to answer the OP Question with the simple Command Line

BeCyPDFMetaEdit "C:\Users\lez\Downloads\Apps\PDF\BeCyPDFMetaEdit\metadata.pdf" -d2 -T "ReLiable Demo" -pl 1r -pl 7D

You can even do that while watching the result, if the viewer is not locking the file. enter image description here

Partial Result, orrectly showing pages 0-5 (PDF pages are base0) are now /r (roman) and 6 onwards are /D (Decimal)

1690 0 obj
<<
/Nums [ 0 <<
/P ()
/S /r
>> 6 <<
/P ()
/S /D
>> ]
>>
endobj
1621 0 obj
<<
/Dests 256 0 R
/MarkInfo <<
/Marked true
/Type /MarkInfo
>>
/PageLabels 1690 0 R
/Pages 255 0 R
/StructTreeRoot 257 0 R
/Type /Catalog
K J
  • 1,248
0

The method given by Dane H. does work with Acrobat Reader (or, to be precise, the current version of Adobe Reader). One minor point to note: the field at the top will only accept 8 characters so you can't enter something like 'subject index' into it if such a label has been used. But you can instead use menu item View > Page Navigation > Go to..., or the key equivalent.

Another tip: pdf specification always assigns page numbers consecutively, so in the case of a document produced by scanning pairs of pages the two sets of numbers get out of step (unless you laboriously number each page individually). But you can with little effort set up your document so the convention 'go to page n gets you to pages 2n and 2n+1' applies.

-1

Danes answer is the best, the formats changed a little now, this might be helpful:

%PDF-1.6

29241 0 obj

<</Metadata 1685 0 R/Outlines 29461 0 R/PageLabels<</Nums[0<</S/D>>3<</S/D/St 6>>4<</S/D/St 10>>5<</S/D/St 12>>15<</S/D/St 70>>16<</S/D/St 72>>17<</S/D/St 80>>18<</S/D/St 82>>19<</S/D/St 90>>23<</S/D/St 96>>25<</S/D/St 99>>29<</S/D/St 110>>31<</S/D/St 130>>32<</S/D/St 133>>35<</S/D/St 137>>36<</S/D/St 140>>37<</S/D/St 145>>39<</S/D/St 150>>40<</S/D/St 152>>42<</S/D/St 155>>43<</S/D/St 160>>46<</S/D/St 165>>47<</S/D/St 167>>48<</S/D/St 170>>49<</S/D/St 180>>50<</S/D/St 190>>52<</S/D/St 300>>53<</S/D/St 305>>54<</S/D/St 319>>56<</S/D/St 380>>57<</S/D/St 390>>58<</S/D/St 500>>67<</S/D/St 515>>68<</S/D/St 525>>70<</S/D/St 550>>71<</S/D/St 553>>72<</S/D/St 560>>73<</S/D/St 600>>76<</S/D/St 620>>78<</S/D/St 650>>82<</S/D/St 670>>85<</S/D/St 700>>95<</S/D/St 714>>117<</S/D/St 900>>162<</S/D/St 1000>>178<</S/D/St 1200>>209<</S/D/St 1500>>263<</S/D/St 1555>>270<</S/D/St 1563>>389<</S/D/St 1681>>522<</S/D/St 1813>>]>> /PageMode/UseOutlines/Pages 29177 0 R/Type/Catalog>>

endobj
daniel
  • 819