The PdfWriter Class

class pypdf.PdfWriter(fileobj: Union[str, IO[Any]] = '', clone_from: Union[None, PdfReader, str, IO[Any], Path] = None)[source]

Bases: PdfDocCommon

Write a PDF file out, given pages produced by another class or through cloning a PDF file during initialization.

Typically data is added from a PdfReader.

property is_encrypted: bool

Read-only boolean property showing whether this PDF file is encrypted.

Note that this property, if true, will remain true even after the decrypt() method is called.

flattened_pages: Optional[List[PageObject]] = None
property root_object: DictionaryObject

Provide direct access to Pdf Structure.

Note

Recommended be used only for read access.

property xmp_metadata: Optional[XmpInformation]

XMP (Extensible Metadata Platform) data.

property pdf_header: str

Read/Write Property Header of the PDF document that is written.

This should be something like '%PDF-1.5'. It is recommended to set the lowest version that supports all features which are used within the PDF file.

Note: pdf_header returns a string but accepts bytes or str for writing

get_object(indirect_reference: Union[int, IndirectObject]) PdfObject[source]
set_need_appearances_writer(state: bool = True) None[source]

Sets the “NeedAppearances” flag in the PDF writer.

The “NeedAppearances” flag indicates whether the appearance dictionary for form fields should be automatically generated by the PDF viewer or if the embedded appearance should be used.

Parameters

state – The actual value of the NeedAppearances flag.

Returns

None

create_viewer_preferences() ViewerPreferences[source]
add_page(page: PageObject, excluded_keys: Iterable[str] = ()) PageObject[source]

Add a page to this PDF file.

Recommended for advanced usage including the adequate excluded_keys.

The page is usually acquired from a PdfReader instance.

Parameters
  • page – The page to add to the document. Should be an instance of PageObject

  • excluded_keys

Returns

The added PageObject.

insert_page(page: PageObject, index: int = 0, excluded_keys: Iterable[str] = ()) PageObject[source]

Insert a page in this PDF file. The page is usually acquired from a PdfReader instance.

Parameters
  • page – The page to add to the document.

  • index – Position at which the page will be inserted.

  • excluded_keys

Returns

The added PageObject.

add_blank_page(width: Optional[float] = None, height: Optional[float] = None) PageObject[source]

Append a blank page to this PDF file and return it.

If no page size is specified, use the size of the last page.

Parameters
  • width – The width of the new page expressed in default user space units.

  • height – The height of the new page expressed in default user space units.

Returns

The newly appended page

Raises

PageSizeNotDefinedError – if width and height are not defined and previous page does not exist.

insert_blank_page(width: Optional[Union[float, Decimal]] = None, height: Optional[Union[float, Decimal]] = None, index: int = 0) PageObject[source]

Insert a blank page to this PDF file and return it.

If no page size is specified, use the size of the last page.

Parameters
  • width – The width of the new page expressed in default user space units.

  • height – The height of the new page expressed in default user space units.

  • index – Position to add the page.

Returns

The newly appended page.

Raises

PageSizeNotDefinedError – if width and height are not defined and previous page does not exist.

property open_destination: Union[None, Destination, TextStringObject, ByteStringObject]
add_js(javascript: str) None[source]

Add JavaScript which will launch upon opening this PDF.

Parameters

javascript – Your Javascript.

>>> output.add_js("this.print({bUI:true,bSilent:false,bShrinkToFit:true});")
# Example: This will launch the print window when the PDF is opened.
add_attachment(filename: str, data: Union[str, bytes]) None[source]

Embed a file inside the PDF.

Reference: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf Section 7.11.3

Parameters
  • filename – The filename to display.

  • data – The data in the file.

append_pages_from_reader(reader: PdfReader, after_page_append: Optional[Callable[[PageObject], None]] = None) None[source]

Copy pages from reader to writer. Includes an optional callback parameter which is invoked after pages are appended to the writer.

append should be preferred.

Parameters
  • reader – a PdfReader object from which to copy page annotations to this writer object. The writer’s annots will then be updated.

  • after_page_append – Callback function that is invoked after each page is appended to the writer. Signature includes a reference to the appended page (delegates to append_pages_from_reader). The single parameter of the callback is a reference to the page just appended to the document.

update_page_form_field_values(page: ~typing.Optional[~typing.Union[~pypdf._page.PageObject, ~typing.List[~pypdf._page.PageObject]]], fields: ~typing.Dict[str, ~typing.Any], flags: ~pypdf.constants.FieldFlag = FieldFlag.None, auto_regenerate: ~typing.Optional[bool] = True) None[source]

Update the form field values for a given page from a fields dictionary.

Copy field texts and values from fields to page. If the field links to a parent object, add the information to the parent.

Parameters
  • pagePageObject - references PDF writer’s page where the annotations and field data will be updated. List[Pageobject] - provides list of page to be processsed. None - all pages.

  • fields – a Python dictionary of field names (/T) and text values (/V).

  • flags – An integer (0 to 7). The first bit sets ReadOnly, the second bit sets Required, the third bit sets NoExport. See PDF Reference Table 8.70 for details.

  • auto_regenerate – set/unset the need_appearances flag ; the flag is unchanged if auto_regenerate is None.

reattach_fields(page: Optional[PageObject] = None) List[DictionaryObject][source]

Parse annotations within the page looking for orphan fields and reattach then into the Fields Structure.

Parameters

page – page to analyze. If none is provided, all pages will be analyzed.

Returns

list of reattached fields.

clone_reader_document_root(reader: PdfReader) None[source]

Copy the reader document root to the writer and all sub elements, including pages, threads, outlines,… For partial insertion, append should be considered.

Parameters

reader – PdfReader from the document root should be copied.

clone_document_from_reader(reader: PdfReader, after_page_append: Optional[Callable[[PageObject], None]] = None) None[source]

Create a copy (clone) of a document from a PDF file reader cloning section ‘/Root’ and ‘/Info’ and ‘/ID’ of the pdf.

Parameters
  • reader – PDF file reader instance from which the clone should be created.

  • after_page_append – Callback function that is invoked after each page is appended to the writer. Signature includes a reference to the appended page (delegates to append_pages_from_reader). The single parameter of the callback is a reference to the page just appended to the document.

generate_file_identifiers() None[source]

Generate an identifier for the PDF that will be written.

The only point of this is ensuring uniqueness. Reproducibility is not required. When a file is first written, both identifiers shall be set to the same value. If both identifiers match when a file reference is resolved, it is very likely that the correct and unchanged file has been found. If only the first identifier matches, a different version of the correct file has been found. see 14.4 “File Identifiers”.

encrypt(user_password: str, owner_password: Optional[str] = None, use_128bit: bool = True, permissions_flag: UserAccessPermissions = UserAccessPermissions.PRINT | MODIFY | EXTRACT | ADD_OR_MODIFY | R7 | R8 | FILL_FORM_FIELDS | EXTRACT_TEXT_AND_GRAPHICS | ASSEMBLE_DOC | PRINT_TO_REPRESENTATION | R13 | R14 | R15 | R16 | R17 | R18 | R19 | R20 | R21 | R22 | R23 | R24 | R25 | R26 | R27 | R28 | R29 | R30 | R31 | R32, *, algorithm: Optional[str] = None) None[source]

Encrypt this PDF file with the PDF Standard encryption handler.

Parameters
  • user_password – The password which allows for opening and reading the PDF file with the restrictions provided.

  • owner_password – The password which allows for opening the PDF files without any restrictions. By default, the owner password is the same as the user password.

  • use_128bit – flag as to whether to use 128bit encryption. When false, 40bit encryption will be used. By default, this flag is on.

  • permissions_flag – permissions as described in TABLE 3.20 of the PDF 1.7 specification. A bit value of 1 means the permission is grantend. Hence an integer value of -1 will set all flags. Bit position 3 is for printing, 4 is for modifying content, 5 and 6 control annotations, 9 for form fields, 10 for extraction of text and graphics.

  • algorithm – encrypt algorithm. Values may be one of “RC4-40”, “RC4-128”, “AES-128”, “AES-256-R5”, “AES-256”. If it is valid, use_128bit will be ignored.

write_stream(stream: IO[Any]) None[source]
write(stream: Union[Path, str, IO[Any]]) Tuple[bool, IO[Any]][source]

Write the collection of pages added to this object out as a PDF file.

Parameters

stream – An object to write the file to. The object can support the write method and the tell method, similar to a file object, or be a file path, just like the fileobj, just named it stream to keep existing workflow.

Returns

A tuple (bool, IO)

add_metadata(infos: Dict[str, Any]) None[source]

Add custom metadata to the output.

Parameters

infos – a Python dictionary where each key is a field and each value is your new metadata.

get_reference(obj: PdfObject) IndirectObject[source]
get_outline_root() TreeObject[source]
get_threads_root() ArrayObject[source]

The list of threads.

See §12.4.3 of the PDF 1.7 or PDF 2.0 specification.

Returns

An array (possibly empty) of Dictionaries with /F and /I properties.

property threads: ArrayObject

Read-only property for the list of threads.

See §8.3.2 from PDF 1.7 spec.

Each element is a dictionaries with /F and /I keys.

add_outline_item_destination(page_destination: Union[IndirectObject, PageObject, TreeObject], parent: Union[None, TreeObject, IndirectObject] = None, before: Union[None, TreeObject, IndirectObject] = None, is_open: bool = True) IndirectObject[source]
add_outline_item_dict(outline_item: Union[OutlineItem, Destination], parent: Union[None, TreeObject, IndirectObject] = None, before: Union[None, TreeObject, IndirectObject] = None, is_open: bool = True) IndirectObject[source]
add_outline_item(title: str, page_number: ~typing.Union[None, ~pypdf._page.PageObject, ~pypdf.generic._base.IndirectObject, int], parent: ~typing.Union[None, ~pypdf.generic._data_structures.TreeObject, ~pypdf.generic._base.IndirectObject] = None, before: ~typing.Union[None, ~pypdf.generic._data_structures.TreeObject, ~pypdf.generic._base.IndirectObject] = None, color: ~typing.Optional[~typing.Union[~typing.Tuple[float, float, float], str]] = None, bold: bool = False, italic: bool = False, fit: ~pypdf.generic._fit.Fit = <pypdf.generic._fit.Fit object>, is_open: bool = True) IndirectObject[source]

Add an outline item (commonly referred to as a “Bookmark”) to the PDF file.

Parameters
  • title – Title to use for this outline item.

  • page_number – Page number this outline item will point to.

  • parent – A reference to a parent outline item to create nested outline items.

  • before

  • color – Color of the outline item’s font as a red, green, blue tuple from 0.0 to 1.0 or as a Hex String (#RRGGBB)

  • bold – Outline item font is bold

  • italic – Outline item font is italic

  • fit – The fit of the destination page.

Returns

The added outline item as an indirect object.

add_outline() None[source]
add_named_destination_array(title: TextStringObject, destination: Union[IndirectObject, ArrayObject]) None[source]
add_named_destination_object(page_destination: PdfObject) IndirectObject[source]
add_named_destination(title: str, page_number: int) IndirectObject[source]

Remove links and annotations from this output.

remove_annotations(subtypes: Optional[Union[Literal['/Text', '/Link', '/FreeText', '/Line', '/Square', '/Circle', '/Polygon', '/PolyLine', '/Highlight', '/Unterline', '/Squiggly', '/StrikeOut', '/Stamp', '/Caret', '/Ink', '/Popup', '/FileAttachment', '/Sound', '/Movie', '/Widget', '/Screen', '/PrinterMark', '/TrapNet', '/Watermark', '/3D', '/Redact'], Iterable[Literal['/Text', '/Link', '/FreeText', '/Line', '/Square', '/Circle', '/Polygon', '/PolyLine', '/Highlight', '/Unterline', '/Squiggly', '/StrikeOut', '/Stamp', '/Caret', '/Ink', '/Popup', '/FileAttachment', '/Sound', '/Movie', '/Widget', '/Screen', '/PrinterMark', '/TrapNet', '/Watermark', '/3D', '/Redact']]]]) None[source]

Remove annotations by annotation subtype.

Parameters

subtypes – SubType or list of SubTypes to be removed. Examples are: “/Link”, “/FileAttachment”, “/Sound”, “/Movie”, “/Screen”, … If you want to remove all annotations, use subtypes=None.

remove_objects_from_page(page: Union[PageObject, DictionaryObject], to_delete: Union[ObjectDeletionFlag, Iterable[ObjectDeletionFlag]]) None[source]

Remove objects specified by to_delete from the given page.

Parameters
  • page – Page object to clean up.

  • to_delete – Objects to be deleted; can be a ObjectDeletionFlag or a list of ObjectDeletionFlag

remove_images(to_delete: ImageType = ImageType.ALL) None[source]

Remove images from this output.

Parameters

to_delete – The type of images to be deleted (default = all images types)

remove_text() None[source]

Remove text from this output.

add_uri(page_number: int, uri: str, rect: RectangleObject, border: Optional[ArrayObject] = None) None[source]

Add an URI from a rectangular area to the specified page.

Parameters
  • page_number – index of the page on which to place the URI action.

  • uri – URI of resource to link to.

  • rectRectangleObject or array of four integers specifying the clickable rectangular area [xLL, yLL, xUR, yUR], or string in the form "[ xLL yLL xUR yUR ]".

  • border – if provided, an array describing border-drawing properties. See the PDF spec for details. No border will be drawn if this argument is omitted.

set_page_layout(layout: Literal['/NoLayout', '/SinglePage', '/OneColumn', '/TwoColumnLeft', '/TwoColumnRight', '/TwoPageLeft', '/TwoPageRight']) None[source]

Set the page layout.

Parameters

layout – The page layout to be used

Valid layout arguments

/NoLayout

Layout explicitly not specified

/SinglePage

Show one page at a time

/OneColumn

Show one column at a time

/TwoColumnLeft

Show pages in two columns, odd-numbered pages on the left

/TwoColumnRight

Show pages in two columns, odd-numbered pages on the right

/TwoPageLeft

Show two pages at a time, odd-numbered pages on the left

/TwoPageRight

Show two pages at a time, odd-numbered pages on the right

property page_layout: Optional[Literal['/NoLayout', '/SinglePage', '/OneColumn', '/TwoColumnLeft', '/TwoColumnRight', '/TwoPageLeft', '/TwoPageRight']]

Page layout property.

Valid layout values

/NoLayout

Layout explicitly not specified

/SinglePage

Show one page at a time

/OneColumn

Show one column at a time

/TwoColumnLeft

Show pages in two columns, odd-numbered pages on the left

/TwoColumnRight

Show pages in two columns, odd-numbered pages on the right

/TwoPageLeft

Show two pages at a time, odd-numbered pages on the left

/TwoPageRight

Show two pages at a time, odd-numbered pages on the right

property page_mode: Optional[Literal['/UseNone', '/UseOutlines', '/UseThumbs', '/FullScreen', '/UseOC', '/UseAttachments']]

Page mode property.

Valid mode values

/UseNone

Do not show outline or thumbnails panels

/UseOutlines

Show outline (aka bookmarks) panel

/UseThumbs

Show page thumbnails panel

/FullScreen

Fullscreen view

/UseOC

Show Optional Content Group (OCG) panel

/UseAttachments

Show attachments panel

add_annotation(page_number: Union[int, PageObject], annotation: Dict[str, Any]) DictionaryObject[source]

Add a single annotation to the page. The added annotation must be a new annotation. It can not be recycled.

Parameters
  • page_number – PageObject or page index.

  • annotation – Annotation to be added (created with annotation).

Returns

The inserted object This can be used for pop-up creation, for example

clean_page(page: Union[PageObject, IndirectObject]) PageObject[source]

Perform some clean up in the page. Currently: convert NameObject nameddestination to TextStringObject (required for names/dests list)

Parameters

page

Returns

The cleaned PageObject

append(fileobj: Union[str, IO[Any], PdfReader, Path], outline_item: Union[str, None, PageRange, Tuple[int, int], Tuple[int, int, int], List[int]] = None, pages: Union[None, PageRange, Tuple[int, int], Tuple[int, int, int], List[int], List[PageObject]] = None, import_outline: bool = True, excluded_fields: Optional[Union[List[str], Tuple[str, ...]]] = None) None[source]

Identical to the merge() method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.

Parameters
  • fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.

  • outline_item – Optionally, you may specify a string to build an outline (aka ‘bookmark’) to identify the beginning of the included file.

  • pages – Can be a PageRange or a (start, stop[, step]) tuple or a list of pages to be processed to merge only the specified range of pages from the source document into the output document.

  • import_outline – You may prevent the source document’s outline (collection of outline items, previously referred to as ‘bookmarks’) from being imported by specifying this as False.

  • excluded_fields – Provide the list of fields/keys to be ignored if /Annots is part of the list, the annotation will be ignored if /B is part of the list, the articles will be ignored

merge(position: Optional[int], fileobj: Union[Path, str, IO[Any], PdfReader], outline_item: Optional[str] = None, pages: Optional[Union[str, PageRange, Tuple[int, int], Tuple[int, int, int], List[int], List[PageObject]]] = None, import_outline: bool = True, excluded_fields: Optional[Union[List[str], Tuple[str, ...]]] = ()) None[source]

Merge the pages from the given file into the output file at the specified page number.

Parameters
  • position – The page number to insert this file. File will be inserted after the given number.

  • fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.

  • outline_item – Optionally, you may specify a string to build an outline (aka ‘bookmark’) to identify the beginning of the included file.

  • pages – can be a PageRange or a (start, stop[, step]) tuple or a list of pages to be processed to merge only the specified range of pages from the source document into the output document.

  • import_outline – You may prevent the source document’s outline (collection of outline items, previously referred to as ‘bookmarks’) from being imported by specifying this as False.

  • excluded_fields – provide the list of fields/keys to be ignored if /Annots is part of the list, the annotation will be ignored if /B is part of the list, the articles will be ignored

Raises

TypeError – The pages attribute is not configured properly

add_filtered_articles(fltr: Union[Pattern[Any], str], pages: Dict[int, PageObject], reader: PdfReader) None[source]

Add articles matching the defined criteria.

Parameters
  • fltr

  • pages

  • reader

property attachments: Mapping[str, List[bytes]]
close() None[source]

To match the functions from Merger.

decode_permissions(permissions_code: int) Dict[str, bool]

Take the permissions as an integer, return the allowed access.

get_destination_page_number(destination: Destination) Optional[int]

Retrieve page number of a given Destination object.

Parameters

destination – The destination to get page number.

Returns

The page number or None if page is not found

get_fields(tree: Optional[TreeObject] = None, retval: Optional[Dict[Any, Any]] = None, fileobj: Optional[Any] = None) Optional[Dict[str, Any]]

Extract field data if this PDF contains interactive form fields.

The tree and retval parameters are for recursive use.

Parameters
  • tree

  • retval

  • fileobj – A file object (usually a text file) to write a report to on all interactive form fields found.

Returns

A dictionary where each key is a field name, and each value is a Field object. By default, the mapping name is used for keys. None if form data could not be located.

get_form_text_fields(full_qualified_name: bool = False) Dict[str, Any]

Retrieve form fields from the document with textual data.

Parameters

full_qualified_name – to get full name

Returns

A dictionary. The key is the name of the form field, the value is the content of the field.

If the document contains multiple form fields with the same name, the second and following will get the suffix .2, .3, …

get_named_dest_root() ArrayObject
get_num_pages() int

Calculate the number of pages in this PDF file.

Returns

The number of pages of the parsed PDF file

Raises

PdfReadError – if file is encrypted and restrictions prevent this action.

get_page(page_number: int) PageObject

Retrieve a page by number from this PDF file. Most of the time `.pages[page_number]` is preferred.

Parameters

page_number – The page number to retrieve (pages begin at zero)

Returns

A PageObject instance.

get_page_number(page: PageObject) Optional[int]

Retrieve page number of a given PageObject.

Parameters

page – The page to get page number. Should be an instance of PageObject

Returns

The page number or None if page is not found

get_pages_showing_field(field: Union[Field, PdfObject, IndirectObject]) List[PageObject]

Provides list of pages where the field is called.

Parameters

field – Field Object, PdfObject or IndirectObject referencing a Field

Returns

List of pages

  • Empty list:

    The field has no widgets attached (either hidden field or ancestor field).

  • Single page list:

    Page where the widget is present (most common).

  • Multi-page list:

    Field with multiple kids widgets (example: radio buttons, field repeated on multiple pages).

property metadata: Optional[DocumentInformation]

Retrieve the PDF file’s document information dictionary, if it exists.

Note that some PDF files use metadata streams instead of document information dictionaries, and these metadata streams will not be accessed by this function.

property named_destinations: Dict[str, Any]

A read-only dictionary which maps names to Destinations

property outline: List[Union[Destination, List[Union[Destination, List[Destination]]]]]

Read-only property for the outline present in the document.

(i.e., a collection of ‘outline items’ which are also known as ‘bookmarks’)

property page_labels: List[str]

A list of labels for the pages in this document.

This property is read-only. The labels are in the order that the pages appear in the document.

property pages: List[PageObject]

Property that emulates a list of PageObject. this property allows to get a page or a range of pages.

For PdfWriter Only: It provides also capability to remove a page/range of page from the list (through del operator) Note: only the page entry is removed. As the objects beneath can be used somewhere else. A solution to completely remove them - if they are not used anywhere - is to write to a buffer/temporary file and to load it into a new PdfWriter object afterwards.

remove_page(page: Union[int, PageObject, IndirectObject], clean: bool = False) None

Remove page from pages list.

Parameters
  • page

    int / PageObject / IndirectObject PageObject : page to be removed. If the page appears many times only the first one will be removed

    IndirectObject: Reference to page to be removed

    int: Page number to be removed

  • clean – replace PageObject with NullObject to prevent destination, annotation to reference a detached page

strict: bool = False
property user_access_permissions: Optional[UserAccessPermissions]

Get the user access permissions for encrypted documents. Returns None if not encrypted.

property viewer_preferences: Optional[ViewerPreferences]

Returns the existing ViewerPreferences as an overloaded dictionary.

property xfa: Optional[Dict[str, Any]]
find_outline_item(outline_item: Dict[str, Any], root: Optional[List[Union[Destination, List[Union[Destination, List[Destination]]]]]] = None) Optional[List[int]][source]
find_bookmark(outline_item: Dict[str, Any], root: Optional[List[Union[Destination, List[Union[Destination, List[Destination]]]]]] = None) Optional[List[int]][source]

Deprecated since version 2.9.0: Use find_outline_item() instead.

reset_translation(reader: Union[None, PdfReader, IndirectObject] = None) None[source]

Reset the translation table between reader and the writer object.

Late cloning will create new independent objects.

Parameters

reader – PdfReader or IndirectObject referencing a PdfReader object. if set to None or omitted, all tables will be reset.

set_page_label(page_index_from: int, page_index_to: int, style: Optional[PageLabelStyle] = None, prefix: Optional[str] = None, start: Optional[int] = 0) None[source]

Set a page label to a range of pages.

Page indexes must be given starting from 0. Labels must have a style, a prefix or both. If to a range is not assigned any page label a decimal label starting from 1 is applied.

Parameters
  • page_index_from – page index of the beginning of the range starting from 0

  • page_index_to – page index of the beginning of the range starting from 0

  • style

    The numbering style to be used for the numeric portion of each page label:

    • /D Decimal arabic numerals

    • /R Uppercase roman numerals

    • /r Lowercase roman numerals

    • /A Uppercase letters (A to Z for the first 26 pages, AA to ZZ for the next 26, and so on)

    • /a Lowercase letters (a to z for the first 26 pages, aa to zz for the next 26, and so on)

  • prefix – The label prefix for page labels in this range.

  • start – The value of the numeric portion for the first page label in the range. Subsequent pages are numbered sequentially from this value, which must be greater than or equal to 1. Default value: 1.

class pypdf.ObjectDeletionFlag(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: IntFlag

NONE = 0
TEXT = 1
ATTACHMENTS = 4
OBJECTS_3D = 8
ALL_ANNOTATIONS = 16
XOBJECT_IMAGES = 32
INLINE_IMAGES = 64
DRAWING_IMAGES = 128
IMAGES = 224