A short tutorial on how to insert a table of contents and roman numeral in a pdf document using PyMuPDF (fitz).
The fullowing is a function which when give the toc_list
and page_list
of a pdf file creates a new pdf.
import fitz
def convert_pdf(input_pdf, output_pdf, toc_list, page_list):
# Open the existing PDF
= fitz.open(input_pdf)
doc
# Create a new table of contents (or modify the existing one)
= doc.get_toc(simple=False)
toc
# Adjust page numbers in the TOC entries
= []
adjusted_toc for level, title, page in toc_list:
0])
adjusted_toc.append([level, title, page,
# Update the TOC of the document and the page numbers
doc.set_toc(adjusted_toc)
doc.set_page_list(page_list)# Save the modified PDF
doc.save(output_pdf) doc.close()
The majority of the work happens in the toc_list
and
page_list
.
For example, if we want to have a document which has 12 roman pages and then continues with arabic pages we have
= [{'startpage': 0, 'prefix': '', 'style': 'R', 'firstpagenum': 1},
page_labels 'startpage': 12, 'prefix': '', 'style': 'D', 'firstpagenum': 13}] {
Similarly for a page with multiple headings:
= [(1, 'National Standard for Organic and Bio-Dynamic Produce', 1),
outline_entries 1, 'Introduction', 3),
(1, 'Important information', 4),
(1, 'Scope of this standard', 8),
(1, 'Definitions', 9),
(1, 'Production requirements', 13),
(2, 'Farm', 13),
(2, 'Conversion of land', 15),
(2, 'Genetic modification', 16),
(2, 'Landscape management and biodiversity', 17)] (
A stand alone commandline script with flags doesn't make sense for this. Maybe a control file which can be called by the main script and is per file?