How to change internal page numbers in the meta data of a PDF?

Question

I have a pdf document I created through non-Acrobat means (printing to pdf, then merging a bunch of pdfs), but I'd like to manually change the page numbers (i.e. the first several pages are simply title pages, the page that is labeled "page 1" is really the 7th sheet of the pdf). What's the simplest (and ideally, free) way to do this?

To be clear, I am not trying to change the numbers on the pages themselves, but the page numbers in the "metadata" that the pdf stores (the pages themselves are already numbered correctly; I just want "go to page 1" to go to the page labeled 1, which could be sheet 7).

For what it's worth, I'm on Windows, though I have access to Macs as well.

score 60 · Answer 1 · edited Sep 15 '23 at 22:51

What you want is indeed called page labels and can easily be added directly in the PDF's source code. Rename the file extension from pdf to txt and open the file in a text editor (this can be slow, depending on the file size, be patient). The information about page labels is stored in a node called the document catalog which looks something like this:

3 0 obj
<< /Type /Catalog
   /Pages 1 0 R
>>
endobj

It may contain more confusing stuff, but this is the basic structure. There is only one catalog, so in a large file you can search for the node that contains /Catalog. Now you can make your desired changes by inserting the /PageLabels entry:

3 0 obj
<< /Type /Catalog
   /Pages 1 0 R
   /PageLabels << /Nums [ 0 << /P (cover) >>
                          % labels 1st page with the string "cover"
                          1 << /S /r >>
                          % numbers pages 2-6 in small roman numerals
                          6 << /S /D >>
                          % numbers pages 7-x in decimal arabic numerals
                        ]
               >>
>>
endobj

There are 3 lines starting with numbers, called page indices. Page 1 has the index 0, page 2 the index 1 and so forth. They always describe ranges, so the line with 1 <<...>> applies to all pages from index 1 to 5 and the line with 6 <<...>> applies to all pages from 6 up to the last page. A label for 0 <<...>> must always be defined.

You can find more information about page labels and PDF source code in section 12.4.2 of the PDF 1.7 standard.

hackerb9 · Answer 2 · 2019-05-15T01:30:25.640

NOTE 1: The accepted answer is still mostly correct, but has some gaps. It is lacking in that many PDF files are not directly editable as text. Even when they are, such editing can sometimes damage the PDF making it unreadable. One solution, that will work for both Unix and Microsoft Windows is qpdf which can translate PDF files into "QDF", a text-editable form which is still a valid PDF file. The qpdf package comes with fix-qdf that recalculates offsets after a QDF file has been edited to correct any damage.

NOTE 2: Uncomfortable with text editors? Try using a GUI editor such as jpdftweak first. Sometimes the GUI pdf editors work, in which case, yay, you're done. However, when they fail, as has often been the case for me, you can try this more robust alternative. Either way, please do not down vote my answer for being less than elegant.

HOW TO Edit PDF Page Numbers Using Qpdf

Summary:

qpdf -qdf foo.pdf foo.qdf

edit foo.qdf

 0 << >>           % No label on first pages
 6 << /S /D >>     % Start numbering from 7th page.

fix-qdf foo.qdf >bar.qdf
test bar.qdf
qpdf bar.qdf bar.pdf

Detailed steps

Step 1.

Convert the document to the easily editable QDF format. Run qpdf from the command line like so:

qpdf -qdf foo.pdf foo.qdf

Note: If you do not have qpdf installed already, Microsoft Windows executables can be downloaded from https://github.com/qpdf/qpdf/releases Unix systems, such as Ubuntu and Debian GNU/Linux can install it by typing apt install qpdf.

Step 2.

Edit the QDF document using a text editor such as notepad++, emacs, or gedit. Search for the word /Catalog and note the <<angle brackets>> it is inside. Nearby, you'll find the current /PageLabels (if any).

We'll be adding each section that should be differently numbered to the /PageLabels. The format is start-page << style >>. Note that white-space does not matter and that the first page of the document is 0. Unless otherwise specified, a new section always starts out numbering pages from 1.

Examples

Here is a full example of what PageLabels may look like, with comments added:

/Type /Catalog
/PageLabels <<
  /Nums [
    0           % From the first page of the document,
      <<
        /S /r   % ...use the lowercase roman numeral style.
      >>
    6           % From seventh page onward,
      <<
        /S /D   % ...use ordinary digits (arabic numerals)
      >>
  ]
>>

If the file has no PageLabels, add them after /Type /Catalog. For example, one might change,

1 0 obj
<<
  …
  /Type /Catalog
>>
endobj

into,

1 0 obj
<<
  … 
  /Type /Catalog
  /PageLabels
      << /Nums [
    0 << >>                 % No label for cover
    1 << /S /r >>           % i, ii for index
    3 << /S /D /St 15 >>    % 15, 16, 17, ... for article
    31 << /S /D /P (A-) >>  % A-1, A-2, A-3... for appendix
       ]
  >>
>>
endobj

OPTIONAL: STARTING FROM A DIFFERENT NUMBER WITH /St

Each section restarts numbering at 1 unless you tell it otherwise using /St. Notice how in the above example, the fourth page starts at 15.

OPTIONAL: USING A DIFFERENT STYLE WITH /S

The /S operator takes an argument that lets you pick the numbering style,

/D digits (1, 2, 3...)
/R uppercase Roman (I, II, III...)
/r lowercase Roman (i, ii, iii...)
/A uppercase alphabetical (A, B, C, ...., X, Y, Z, AA, AB, AC,...)
/a lowercase alphabetical (a, b, c, ...., x, y, z, aa, ab, ac,...)

If one omits the /S operator, then that section of pages will have no numbering. For example:

0 << >>         % No label for cover

OPTIONAL: ADDING A PREFIX TO EACH PAGE WITH /P

You can show any string of text before the page number by specifying a word in parentheses after /P:

  31
  <<
    /S /D
    /P (A-)     % label appendix pages A-1, A-2, A-3
  >>

Specifying a prefix without a style (/S), will give you pages that have only the word without any number. This can be useful, for example, if you'd like a cover page to simply have the label "Cover".

     0 << /P (Cover) >>        % No number, just "Cover"

Step 3.

Run fix-qdf to make your edits valid PDF and put the output in bar.qdf.

fix-qdf foo.qdf > bar.qdf

Step 4.

Open bar.qdf in your PDF viewing program and check that it is numbered correctly.

Step 5.

Convert the QDF file back into a normal PDF, like so:

qpdf bar.qdf bar.pdf

Ta da. You're done. You now have a document with correctly labeled page numbers in bar.pdf.

score 8 · Answer 3 · answered Jan 13 '19 at 21:17

There is a little python script, that can do the job: https://github.com/lovasoa/pagelabels-py

In your case call something like:

./addpagelabels.py --delete file.pdf
./addpagelabels.py --startpage 1 --type 'roman lowercase' file.pdf
./addpagelabels.py --startpage 7 --type arabic file.pdf

Pkkm · Answer 4 · 2021-05-28T11:43:38.977

The Java variant of pdftk has support for editing page labels starting from version 3.1.0.

To use it, first create a file with the labels, let's say it's called metadata.txt:

PageLabelBegin
PageLabelNewIndex: 1
PageLabelStart: 1
PageLabelPrefix: Cover
PageLabelNumStyle: NoNumber
PageLabelBegin
PageLabelNewIndex: 2
PageLabelStart: 1
PageLabelPrefix: Back Cover
PageLabelNumStyle: NoNumber
PageLabelBegin
PageLabelNewIndex: 3
PageLabelStart: 1
PageLabelNumStyle: LowercaseRomanNumerals
PageLabelBegin
PageLabelNewIndex: 27
PageLabelStart: 1
PageLabelNumStyle: DecimalArabicNumerals

PageLabelNewIndex is the page from which the numbering style applies, counting from one.
PageLabelStart is the starting number. For example, if you specify 5 here, the pages will be numbered 5, 6, 7, ...
PageLabelNumStyle can be DecimalArabicNumerals, UppercaseRomanNumerals, LowercaseRomanNumerals, UppercaseLetters, LowercaseLetters or NoNumber.

After you've finished editing, apply the metadata to your PDF file:

pdftk book.pdf update_info metadata.txt output book-with-metadata.pdf

Kurt Pfeifle · Answer 5 · 2011-01-15T11:08:22.427

If I understand you correctly, here is how it should work:

gs \
  -o modified-pagelabels-50pages.pdf \
  -sDEVICE=pdfwrite \
  -c "[ /Page 1 /Label (i)     /PAGELABEL pdfmark" \
  -c "[ /Page 2 /Label (ii)    /PAGELABEL pdfmark" \
  -c "[ /Page 3 /Label (III)   /PAGELABEL pdfmark" \
  -c "[ /Page 4 /Label (four)  /PAGELABEL pdfmark" \
  -c "[ /Page 5 /Label (v)     /PAGELABEL pdfmark" \
  -c "[ /Page 6 /Label (|||||) /PAGELABEL pdfmark" \
  -f 50pages.pdf

However, I seem to remember, that this didn't reliably or fully work last time I tried this (about 2 years ago).

UPDATE: My memory wasn't failing me. I now tried again and filed a bug report for Ghostscript (bug 691889) concerning this. Follow the link to the bug report to see the details.

score 6 · Answer 6 · answered Aug 15 '14 at 07:23

6

jPdf Tweak is an Open Source graphical utility that lets you edit page labels in PDF files. The documentation page provides step-by-step instructions.

answered Aug 15 '14 at 07:23

CherryBerry

111

hackerb9 · Answer 7 · 2023-09-20T19:49:16.393

This answer is a corollary to the text editor method I posted previously. It was requested on another forum that I add an example using the Python API to qpdf instead of its command line interface.

PikePDF: qpdf for Python

Changing page numbers programmatically

The following is a working program for a moderately complex document that changes numbering four times.

from pikepdf import open as Pdfopen, Name, Dictionary, NumberTree
pdf=Pdfopen("input.pdf")
try:
    pdf.Root.PageLabels
except:
    nt = NumberTree.new(pdf)
    pdf.Root.PageLabels = nt.obj
pagelabels = NumberTree(pdf.Root.PageLabels)
First page is a cover, so it should have no page number.
pagelabels[0]  = Dictionary()
Second and third page use lowercase Roman numerals
pagelabels[1]  = Dictionary(S=Name.r)           # i, ii for index
Pages 4 thru 31 use digits starting at "15".
pagelabels[3]  = Dictionary(S=Name.D, St=15)    # 15, 16, 17, ... for article
Pages 32 onwards have a prefix of "A-" before digit.
pagelabels[31] = Dictionary(S=Name.D, P='A-')   # A-1, A-2, A-3 for appendix
pdf.save('output.pdf')
Page labels will now be:
<blank>, i, ii, 15, 16, 17, ..., 41, 42, A-1, A-2, A-3

Usage notes

PageLabels are what most people call "page numbers".
Adding a PageLabel to a page changes the numbering scheme for all following pages until the next PageLabel.
The three Dictionary options for a PageLabel are:
1. S: Numbering Style (pikepdf.Name, defaults to none)
2. St: Starting number (integer, defaults to 1)
3. P: Prefix before each number (string, defaults to none)
The possible values for S (the style) are:
- Name.D: Digits (1, 2, 3, ..)
- Name.R: Uppercase Roman (I, II, III...)
- Name.r: lowercase roman (i, ii, iii...)
- Name.A: Uppercase Alphabetical (A, B, C, ...., X, Y, Z, AA, AB, AC,...)
- Name.a: lowercase alphabetical (a, b, c, ...., x, y, z, aa, ab, ac,...)
If S is omitted from the Dictionary(), pages will be labelled with only the prefix (P), if any. If P is also omitted, then PDF readers will show a blank for the "page number".
If for some reason you wanted to have a word as a page number, you can do that by setting a prefix (P) without a numbering style (S).
```
pagelabels[34] = Dictionary(P='Hello')        # "Hello" is the page number
```
If no PageLabels exist, then the first page is numbered "1".
The first page of a PDF is indexed at [0], even though its page label is "1". This trips everyone up.
The official documentation is here:

https://pikepdf.readthedocs.io/en/latest/api/models.html#pikepdf.NumberTree

_{This method is not for everyone, but I post it in the hope that it will be useful to the "Super Users" who know Python.
I welcome comments on how my answers can be improved.}

score 0 · Answer 8 · answered May 27 '18 at 17:48

I found direct editing of the file (as uncompressed by pdftk) not to work if there are already '/titles' set in the '/outlines' region. The direct-editing technique described in a post above is demonstrated on Youtube: https://www.youtube.com/watch?v=zoH1Z_hSpak

But the 'update' feature of pdftk may be more intuitive (and more reliable when '/titles' already exist in the '/outlines' region of the PDF file) via editing the 'doc_data.txt' file used here: https://www.pdflabs.com/blog/export-and-import-pdf-bookmarks/

score 0 · Answer 9 · answered Mar 03 '23 at 13:20

I'm extending the excellent answer from @hackerb9 with an example that will reset the page numbering of the whole pdf file to 1.

This can be useful in case of weird or broken page numbering when combining multiple different pdfs.

To reset page numbering of entire pdf file, starting from 1 with page 1 do the following:

qpdf -qdf foo.pdf foo.qdf

open foo.qdf with a text editor and replace the first object with this

%% Original object ID: 1 0
1 0 obj
<<
  /Type /Catalog
  /Pages 2 0 R
>>
endobj

fix-qdf foo.qdf >ok.qdf
test ok.qdf
qpdf ok.qdf ok.pdf

K J · Answer 10 · 2023-09-16T20:21:55.160

Old question from 2011 and best answer valid from 2005 to 2018 has been deleted.

One of the best PDFMetaEdit application in its day was BeCyPDFMetaEdit which could alter "Page Labels" via Command Line Interface or GUI.

Its unique ability to Fix/Repair broken PDFs, Edit MetaData and roll back historic additions set it apart from many simpler applications.

Became abandonware in its later years but still works perfectly well (with limitations) in Windows Only.

The web archive is at https://web.archive.org/web/20180929111456/http://www.becyhome.de/becypdfmetaedit/description_eng.htm

Based on a powerful PDF assembler it was designed for PDFs upto but not including more recent 1.7 variants with XMP tracking data, so has some ability to remove XMP but not fully edit the XML.

To answer the OP question adding labels was simple in the GUI as 2 line entries. Here setting 1st 6 pages as I-VI and 7 as 1 onwards, by using "incremental mode" it does not alter source data, thus not corrupt any PDF such as newer versions with XMP etc.

Files with XMP are not the majority but when encountered, that extra Modern MetaData may need special handling by other means

1.3 Metadata (XMP)
Since PDF version 1.4, metadata can be stored in a new XML-based format
named XMP ("Extensible Metadata Platform"). With regard to backward
compatibility, newer PDF documents contain the metadata both in XMP and
also in classical form. However, the application is currently not
capable of processing metadata in XMP format. This can lead to the
effect that a PDF viewer shows the original field values after the
metadata has been edited. Even if the PDF viewer shows the new metadata,
the XMP metadata still contains the original field values which could be
extracted using a hex or text editor.
To address these problems, the application allows at least to remove the
XMP metadata. The metadata will then only be stored in the classical
format.
Attention: XMP-based metadata cannot only be specified for the entire
PDF document but also for parts of it. The application only deletes the
document-specific metadata, metadata attached to other document content
stays as is. Therefore, additional tools like a hex editor are required
if all XMP metadata shall be removed.

NOTE

Windows Security treats such a powerful app as undesirable since it can remove PDF protection etc. and thus it needs to be run in administrator mode with other compatibility switches, (Perfectly safe, but "Not for the faint hearted").

Later Edit actually it plays nicely in Windows 10/11 using a wrapper.bat file to bypass UAC, no need to find a registry key or other workaround, use something like this:-

cmd /min /C "set __COMPAT_LAYER=RUNASINVOKER && start "" %~pd0\BeCyPDFMetaEdit.com %*"

Once done it is easy to answer the OP Question with the simple Command Line

BeCyPDFMetaEdit "C:\Users\lez\Downloads\Apps\PDF\BeCyPDFMetaEdit\metadata.pdf" -d2 -T "ReLiable Demo" -pl 1r -pl 7D

You can even do that while watching the result, if the viewer is not locking the file.

Partial Result, orrectly showing pages 0-5 (PDF pages are base0) are now /r (roman) and 6 onwards are /D (Decimal)

1690 0 obj
<<
/Nums [ 0 <<
/P ()
/S /r
>> 6 <<
/P ()
/S /D
>> ]
>>
endobj
1621 0 obj
<<
/Dests 256 0 R
/MarkInfo <<
/Marked true
/Type /MarkInfo
>>
/PageLabels 1690 0 R
/Pages 255 0 R
/StructTreeRoot 257 0 R
/Type /Catalog

score 0 · Answer 11 · answered Mar 18 '14 at 12:39

The method given by Dane H. does work with Acrobat Reader (or, to be precise, the current version of Adobe Reader). One minor point to note: the field at the top will only accept 8 characters so you can't enter something like 'subject index' into it if such a label has been used. But you can instead use menu item View > Page Navigation > Go to..., or the key equivalent.

Another tip: pdf specification always assigns page numbers consecutively, so in the case of a document produced by scanning pairs of pages the two sets of numbers get out of step (unless you laboriously number each page individually). But you can with little effort set up your document so the convention 'go to page n gets you to pages 2n and 2n+1' applies.

score -1 · Answer 12 · answered Jun 24 '14 at 15:59

Danes answer is the best, the formats changed a little now, this might be helpful:

%PDF-1.6

29241 0 obj

<</Metadata 1685 0 R/Outlines 29461 0 R/PageLabels<</Nums[0<</S/D>>3<</S/D/St 6>>4<</S/D/St 10>>5<</S/D/St 12>>15<</S/D/St 70>>16<</S/D/St 72>>17<</S/D/St 80>>18<</S/D/St 82>>19<</S/D/St 90>>23<</S/D/St 96>>25<</S/D/St 99>>29<</S/D/St 110>>31<</S/D/St 130>>32<</S/D/St 133>>35<</S/D/St 137>>36<</S/D/St 140>>37<</S/D/St 145>>39<</S/D/St 150>>40<</S/D/St 152>>42<</S/D/St 155>>43<</S/D/St 160>>46<</S/D/St 165>>47<</S/D/St 167>>48<</S/D/St 170>>49<</S/D/St 180>>50<</S/D/St 190>>52<</S/D/St 300>>53<</S/D/St 305>>54<</S/D/St 319>>56<</S/D/St 380>>57<</S/D/St 390>>58<</S/D/St 500>>67<</S/D/St 515>>68<</S/D/St 525>>70<</S/D/St 550>>71<</S/D/St 553>>72<</S/D/St 560>>73<</S/D/St 600>>76<</S/D/St 620>>78<</S/D/St 650>>82<</S/D/St 670>>85<</S/D/St 700>>95<</S/D/St 714>>117<</S/D/St 900>>162<</S/D/St 1000>>178<</S/D/St 1200>>209<</S/D/St 1500>>263<</S/D/St 1555>>270<</S/D/St 1563>>389<</S/D/St 1681>>522<</S/D/St 1813>>]>> /PageMode/UseOutlines/Pages 29177 0 R/Type/Catalog>>

endobj

How to change internal page numbers in the meta data of a PDF?

12 Answers12

HOW TO Edit PDF Page Numbers Using Qpdf

Summary:

Detailed steps

Step 1.

Step 2.

Examples

Step 3.

Step 4.

Step 5.

PikePDF: qpdf for Python

Changing page numbers programmatically

First page is a cover, so it should have no page number.

Second and third page use lowercase Roman numerals

Pages 4 thru 31 use digits starting at "15".

Pages 32 onwards have a prefix of "A-" before digit.

Page labels will now be:

<blank>, i, ii, 15, 16, 17, ..., 41, 42, A-1, A-2, A-3

Usage notes

NOTE

Linked