About

A short tutorial on how to insert a table of contents and roman numeral in a pdf document using PyMuPDF (fitz).

The fullowing is a function which when give the toc_list and page_list of a pdf file creates a new pdf.

import fitz 

def convert_pdf(input_pdf, output_pdf, toc_list, page_list):
    # Open the existing PDF
    doc = fitz.open(input_pdf)

    # Create a new table of contents (or modify the existing one)
    toc = doc.get_toc(simple=False)

    # Adjust page numbers in the TOC entries
    adjusted_toc = []
    for level, title, page in toc_list:
        adjusted_toc.append([level, title, page, 0])

    # Update the TOC of the document and the page numbers
    doc.set_toc(adjusted_toc)
    doc.set_page_list(page_list)
    # Save the modified PDF
    doc.save(output_pdf)
    doc.close()

The majority of the work happens in the toc_list and page_list.

For example, if we want to have a document which has 12 roman pages and then continues with arabic pages we have

page_labels = [{'startpage': 0, 'prefix': '', 'style': 'R', 'firstpagenum': 1},
               {'startpage': 12, 'prefix': '', 'style': 'D', 'firstpagenum': 13}]

Similarly for a page with multiple headings:

outline_entries = [(1, 'National Standard for Organic and Bio-Dynamic Produce', 1),
                   (1, 'Introduction', 3),
                   (1, 'Important information', 4),
                   (1, 'Scope of this standard', 8),
                   (1, 'Definitions', 9),
                   (1, 'Production requirements', 13),
                   (2, 'Farm', 13),
                   (2, 'Conversion of land', 15),
                   (2, 'Genetic modification', 16),
                   (2, 'Landscape management and biodiversity', 17)]

A stand alone commandline script with flags doesn't make sense for this. Maybe a control file which can be called by the main script and is per file?