The PdfWriter Class
- class pypdf.PdfWriter(fileobj: Union[None, PdfReader, str, IO[Any], Path] = '', clone_from: Union[None, PdfReader, str, IO[Any], Path] = None)[source]
Bases:
PdfDocCommon
Write a PDF file out, given pages produced by another class or through cloning a PDF file during initialization.
Typically data is added from a
PdfReader
.- property is_encrypted: bool
Read-only boolean property showing whether this PDF file is encrypted.
Note that this property, if true, will remain true even after the
decrypt()
method is called.
- flattened_pages: Optional[List[PageObject]] = None
- property root_object: DictionaryObject
Provide direct access to Pdf Structure.
Note
Recommended be used only for read access.
- property xmp_metadata: Optional[XmpInformation]
XMP (Extensible Metadata Platform) data.
- property pdf_header: str
Read/Write Property Header of the PDF document that is written.
This should be something like
'%PDF-1.5'
. It is recommended to set the lowest version that supports all features which are used within the PDF file.Note: pdf_header returns a string but accepts bytes or str for writing
- set_need_appearances_writer(state: bool = True) None [source]
Sets the “NeedAppearances” flag in the PDF writer.
The “NeedAppearances” flag indicates whether the appearance dictionary for form fields should be automatically generated by the PDF viewer or if the embedded appearance should be used.
- Parameters
state – The actual value of the NeedAppearances flag.
- Returns
None
- create_viewer_preferences() ViewerPreferences [source]
- add_page(page: PageObject, excluded_keys: Iterable[str] = ()) PageObject [source]
Add a page to this PDF file.
Recommended for advanced usage including the adequate excluded_keys.
The page is usually acquired from a
PdfReader
instance.- Parameters
page – The page to add to the document. Should be an instance of
PageObject
excluded_keys –
- Returns
The added PageObject.
- insert_page(page: PageObject, index: int = 0, excluded_keys: Iterable[str] = ()) PageObject [source]
Insert a page in this PDF file. The page is usually acquired from a
PdfReader
instance.- Parameters
page – The page to add to the document.
index – Position at which the page will be inserted.
excluded_keys –
- Returns
The added PageObject.
- add_blank_page(width: Optional[float] = None, height: Optional[float] = None) PageObject [source]
Append a blank page to this PDF file and return it.
If no page size is specified, use the size of the last page.
- Parameters
width – The width of the new page expressed in default user space units.
height – The height of the new page expressed in default user space units.
- Returns
The newly appended page
- Raises
PageSizeNotDefinedError – if width and height are not defined and previous page does not exist.
- insert_blank_page(width: Optional[Union[float, Decimal]] = None, height: Optional[Union[float, Decimal]] = None, index: int = 0) PageObject [source]
Insert a blank page to this PDF file and return it.
If no page size is specified, use the size of the last page.
- Parameters
width – The width of the new page expressed in default user space units.
height – The height of the new page expressed in default user space units.
index – Position to add the page.
- Returns
The newly appended page.
- Raises
PageSizeNotDefinedError – if width and height are not defined and previous page does not exist.
- property open_destination: Union[None, Destination, TextStringObject, ByteStringObject]
- add_js(javascript: str) None [source]
Add JavaScript which will launch upon opening this PDF.
- Parameters
javascript – Your Javascript.
>>> output.add_js("this.print({bUI:true,bSilent:false,bShrinkToFit:true});") # Example: This will launch the print window when the PDF is opened.
- add_attachment(filename: str, data: Union[str, bytes]) None [source]
Embed a file inside the PDF.
Reference: https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf Section 7.11.3
- Parameters
filename – The filename to display.
data – The data in the file.
- append_pages_from_reader(reader: PdfReader, after_page_append: Optional[Callable[[PageObject], None]] = None) None [source]
Copy pages from reader to writer. Includes an optional callback parameter which is invoked after pages are appended to the writer.
append
should be preferred.- Parameters
reader – a PdfReader object from which to copy page annotations to this writer object. The writer’s annots will then be updated.
after_page_append – Callback function that is invoked after each page is appended to the writer. Signature includes a reference to the appended page (delegates to append_pages_from_reader). The single parameter of the callback is a reference to the page just appended to the document.
- update_page_form_field_values(page: ~typing.Optional[~typing.Union[~pypdf._page.PageObject, ~typing.List[~pypdf._page.PageObject]]], fields: ~typing.Dict[str, ~typing.Any], flags: ~pypdf.constants.FieldFlag = FieldFlag.None, auto_regenerate: ~typing.Optional[bool] = True) None [source]
Update the form field values for a given page from a fields dictionary.
Copy field texts and values from fields to page. If the field links to a parent object, add the information to the parent.
- Parameters
page – PageObject - references PDF writer’s page where the annotations and field data will be updated. List[Pageobject] - provides list of page to be processsed. None - all pages.
fields – a Python dictionary of field names (/T) and text values (/V).
flags – An integer (0 to 7). The first bit sets ReadOnly, the second bit sets Required, the third bit sets NoExport. See PDF Reference Table 8.70 for details.
auto_regenerate – set/unset the need_appearances flag ; the flag is unchanged if auto_regenerate is None.
- reattach_fields(page: Optional[PageObject] = None) List[DictionaryObject] [source]
Parse annotations within the page looking for orphan fields and reattach then into the Fields Structure.
- Parameters
page – page to analyze. If none is provided, all pages will be analyzed.
- Returns
list of reattached fields.
- clone_reader_document_root(reader: PdfReader) None [source]
Copy the reader document root to the writer and all sub elements, including pages, threads, outlines,… For partial insertion,
append
should be considered.- Parameters
reader – PdfReader from the document root should be copied.
- clone_document_from_reader(reader: PdfReader, after_page_append: Optional[Callable[[PageObject], None]] = None) None [source]
Create a copy (clone) of a document from a PDF file reader cloning section ‘/Root’ and ‘/Info’ and ‘/ID’ of the pdf.
- Parameters
reader – PDF file reader instance from which the clone should be created.
after_page_append – Callback function that is invoked after each page is appended to the writer. Signature includes a reference to the appended page (delegates to append_pages_from_reader). The single parameter of the callback is a reference to the page just appended to the document.
- generate_file_identifiers() None [source]
Generate an identifier for the PDF that will be written.
The only point of this is ensuring uniqueness. Reproducibility is not required. When a file is first written, both identifiers shall be set to the same value. If both identifiers match when a file reference is resolved, it is very likely that the correct and unchanged file has been found. If only the first identifier matches, a different version of the correct file has been found. see 14.4 “File Identifiers”.
- encrypt(user_password: str, owner_password: Optional[str] = None, use_128bit: bool = True, permissions_flag: UserAccessPermissions = UserAccessPermissions.PRINT | MODIFY | EXTRACT | ADD_OR_MODIFY | R7 | R8 | FILL_FORM_FIELDS | EXTRACT_TEXT_AND_GRAPHICS | ASSEMBLE_DOC | PRINT_TO_REPRESENTATION | R13 | R14 | R15 | R16 | R17 | R18 | R19 | R20 | R21 | R22 | R23 | R24 | R25 | R26 | R27 | R28 | R29 | R30 | R31 | R32, *, algorithm: Optional[str] = None) None [source]
Encrypt this PDF file with the PDF Standard encryption handler.
- Parameters
user_password – The password which allows for opening and reading the PDF file with the restrictions provided.
owner_password – The password which allows for opening the PDF files without any restrictions. By default, the owner password is the same as the user password.
use_128bit – flag as to whether to use 128bit encryption. When false, 40bit encryption will be used. By default, this flag is on.
permissions_flag – permissions as described in TABLE 3.20 of the PDF 1.7 specification. A bit value of 1 means the permission is grantend. Hence an integer value of -1 will set all flags. Bit position 3 is for printing, 4 is for modifying content, 5 and 6 control annotations, 9 for form fields, 10 for extraction of text and graphics.
algorithm – encrypt algorithm. Values may be one of “RC4-40”, “RC4-128”, “AES-128”, “AES-256-R5”, “AES-256”. If it is valid, use_128bit will be ignored.
- write(stream: Union[Path, str, IO[Any]]) Tuple[bool, IO[Any]] [source]
Write the collection of pages added to this object out as a PDF file.
- Parameters
stream – An object to write the file to. The object can support the write method and the tell method, similar to a file object, or be a file path, just like the fileobj, just named it stream to keep existing workflow.
- Returns
A tuple (bool, IO)
- add_metadata(infos: Dict[str, Any]) None [source]
Add custom metadata to the output.
- Parameters
infos – a Python dictionary where each key is a field and each value is your new metadata.
- get_reference(obj: PdfObject) IndirectObject [source]
- get_outline_root() TreeObject [source]
- get_threads_root() ArrayObject [source]
The list of threads.
See §12.4.3 of the PDF 1.7 or PDF 2.0 specification.
- Returns
An array (possibly empty) of Dictionaries with
/F
and/I
properties.
- property threads: ArrayObject
Read-only property for the list of threads.
See §8.3.2 from PDF 1.7 spec.
Each element is a dictionaries with
/F
and/I
keys.
- add_outline_item_destination(page_destination: Union[IndirectObject, PageObject, TreeObject], parent: Union[None, TreeObject, IndirectObject] = None, before: Union[None, TreeObject, IndirectObject] = None, is_open: bool = True) IndirectObject [source]
- add_outline_item_dict(outline_item: Union[OutlineItem, Destination], parent: Union[None, TreeObject, IndirectObject] = None, before: Union[None, TreeObject, IndirectObject] = None, is_open: bool = True) IndirectObject [source]
- add_outline_item(title: str, page_number: ~typing.Union[None, ~pypdf._page.PageObject, ~pypdf.generic._base.IndirectObject, int], parent: ~typing.Union[None, ~pypdf.generic._data_structures.TreeObject, ~pypdf.generic._base.IndirectObject] = None, before: ~typing.Union[None, ~pypdf.generic._data_structures.TreeObject, ~pypdf.generic._base.IndirectObject] = None, color: ~typing.Optional[~typing.Union[~typing.Tuple[float, float, float], str]] = None, bold: bool = False, italic: bool = False, fit: ~pypdf.generic._fit.Fit = <pypdf.generic._fit.Fit object>, is_open: bool = True) IndirectObject [source]
Add an outline item (commonly referred to as a “Bookmark”) to the PDF file.
- Parameters
title – Title to use for this outline item.
page_number – Page number this outline item will point to.
parent – A reference to a parent outline item to create nested outline items.
before –
color – Color of the outline item’s font as a red, green, blue tuple from 0.0 to 1.0 or as a Hex String (#RRGGBB)
bold – Outline item font is bold
italic – Outline item font is italic
fit – The fit of the destination page.
- Returns
The added outline item as an indirect object.
- add_named_destination_array(title: TextStringObject, destination: Union[IndirectObject, ArrayObject]) None [source]
- add_named_destination_object(page_destination: PdfObject) IndirectObject [source]
- add_named_destination(title: str, page_number: int) IndirectObject [source]
- remove_annotations(subtypes: Optional[Union[Literal['/Text', '/Link', '/FreeText', '/Line', '/Square', '/Circle', '/Polygon', '/PolyLine', '/Highlight', '/Unterline', '/Squiggly', '/StrikeOut', '/Stamp', '/Caret', '/Ink', '/Popup', '/FileAttachment', '/Sound', '/Movie', '/Widget', '/Screen', '/PrinterMark', '/TrapNet', '/Watermark', '/3D', '/Redact'], Iterable[Literal['/Text', '/Link', '/FreeText', '/Line', '/Square', '/Circle', '/Polygon', '/PolyLine', '/Highlight', '/Unterline', '/Squiggly', '/StrikeOut', '/Stamp', '/Caret', '/Ink', '/Popup', '/FileAttachment', '/Sound', '/Movie', '/Widget', '/Screen', '/PrinterMark', '/TrapNet', '/Watermark', '/3D', '/Redact']]]]) None [source]
Remove annotations by annotation subtype.
- Parameters
subtypes – SubType or list of SubTypes to be removed. Examples are: “/Link”, “/FileAttachment”, “/Sound”, “/Movie”, “/Screen”, … If you want to remove all annotations, use subtypes=None.
- remove_objects_from_page(page: Union[PageObject, DictionaryObject], to_delete: Union[ObjectDeletionFlag, Iterable[ObjectDeletionFlag]]) None [source]
Remove objects specified by
to_delete
from the given page.- Parameters
page – Page object to clean up.
to_delete – Objects to be deleted; can be a
ObjectDeletionFlag
or a list of ObjectDeletionFlag
- remove_images(to_delete: ImageType = ImageType.ALL) None [source]
Remove images from this output.
- Parameters
to_delete – The type of images to be deleted (default = all images types)
- add_uri(page_number: int, uri: str, rect: RectangleObject, border: Optional[ArrayObject] = None) None [source]
Add an URI from a rectangular area to the specified page.
- Parameters
page_number – index of the page on which to place the URI action.
uri – URI of resource to link to.
rect –
RectangleObject
or array of four integers specifying the clickable rectangular area[xLL, yLL, xUR, yUR]
, or string in the form"[ xLL yLL xUR yUR ]"
.border – if provided, an array describing border-drawing properties. See the PDF spec for details. No border will be drawn if this argument is omitted.
- set_page_layout(layout: Literal['/NoLayout', '/SinglePage', '/OneColumn', '/TwoColumnLeft', '/TwoColumnRight', '/TwoPageLeft', '/TwoPageRight']) None [source]
Set the page layout.
- Parameters
layout – The page layout to be used
/NoLayout
Layout explicitly not specified
/SinglePage
Show one page at a time
/OneColumn
Show one column at a time
/TwoColumnLeft
Show pages in two columns, odd-numbered pages on the left
/TwoColumnRight
Show pages in two columns, odd-numbered pages on the right
/TwoPageLeft
Show two pages at a time, odd-numbered pages on the left
/TwoPageRight
Show two pages at a time, odd-numbered pages on the right
- property page_layout: Optional[Literal['/NoLayout', '/SinglePage', '/OneColumn', '/TwoColumnLeft', '/TwoColumnRight', '/TwoPageLeft', '/TwoPageRight']]
Page layout property.
/NoLayout
Layout explicitly not specified
/SinglePage
Show one page at a time
/OneColumn
Show one column at a time
/TwoColumnLeft
Show pages in two columns, odd-numbered pages on the left
/TwoColumnRight
Show pages in two columns, odd-numbered pages on the right
/TwoPageLeft
Show two pages at a time, odd-numbered pages on the left
/TwoPageRight
Show two pages at a time, odd-numbered pages on the right
- property page_mode: Optional[Literal['/UseNone', '/UseOutlines', '/UseThumbs', '/FullScreen', '/UseOC', '/UseAttachments']]
Page mode property.
/UseNone
Do not show outline or thumbnails panels
/UseOutlines
Show outline (aka bookmarks) panel
/UseThumbs
Show page thumbnails panel
/FullScreen
Fullscreen view
/UseOC
Show Optional Content Group (OCG) panel
/UseAttachments
Show attachments panel
- add_annotation(page_number: Union[int, PageObject], annotation: Dict[str, Any]) DictionaryObject [source]
Add a single annotation to the page. The added annotation must be a new annotation. It can not be recycled.
- Parameters
page_number – PageObject or page index.
annotation – Annotation to be added (created with annotation).
- Returns
The inserted object This can be used for pop-up creation, for example
- clean_page(page: Union[PageObject, IndirectObject]) PageObject [source]
Perform some clean up in the page. Currently: convert NameObject nameddestination to TextStringObject (required for names/dests list)
- Parameters
page –
- Returns
The cleaned PageObject
- append(fileobj: Union[str, IO[Any], PdfReader, Path], outline_item: Union[str, None, PageRange, Tuple[int, int], Tuple[int, int, int], List[int]] = None, pages: Union[None, PageRange, Tuple[int, int], Tuple[int, int, int], List[int], List[PageObject]] = None, import_outline: bool = True, excluded_fields: Optional[Union[List[str], Tuple[str, ...]]] = None) None [source]
Identical to the
merge()
method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.- Parameters
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
outline_item – Optionally, you may specify a string to build an outline (aka ‘bookmark’) to identify the beginning of the included file.
pages – Can be a
PageRange
or a(start, stop[, step])
tuple or a list of pages to be processed to merge only the specified range of pages from the source document into the output document.import_outline – You may prevent the source document’s outline (collection of outline items, previously referred to as ‘bookmarks’) from being imported by specifying this as
False
.excluded_fields – Provide the list of fields/keys to be ignored if
/Annots
is part of the list, the annotation will be ignored if/B
is part of the list, the articles will be ignored
- merge(position: Optional[int], fileobj: Union[Path, str, IO[Any], PdfReader], outline_item: Optional[str] = None, pages: Optional[Union[str, PageRange, Tuple[int, int], Tuple[int, int, int], List[int], List[PageObject]]] = None, import_outline: bool = True, excluded_fields: Optional[Union[List[str], Tuple[str, ...]]] = ()) None [source]
Merge the pages from the given file into the output file at the specified page number.
- Parameters
position – The page number to insert this file. File will be inserted after the given number.
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
outline_item – Optionally, you may specify a string to build an outline (aka ‘bookmark’) to identify the beginning of the included file.
pages – can be a
PageRange
or a(start, stop[, step])
tuple or a list of pages to be processed to merge only the specified range of pages from the source document into the output document.import_outline – You may prevent the source document’s outline (collection of outline items, previously referred to as ‘bookmarks’) from being imported by specifying this as
False
.excluded_fields – provide the list of fields/keys to be ignored if
/Annots
is part of the list, the annotation will be ignored if/B
is part of the list, the articles will be ignored
- Raises
TypeError – The pages attribute is not configured properly
- add_filtered_articles(fltr: Union[Pattern[Any], str], pages: Dict[int, PageObject], reader: PdfReader) None [source]
Add articles matching the defined criteria.
- Parameters
fltr –
pages –
reader –
- decode_permissions(permissions_code: int) Dict[str, bool]
Take the permissions as an integer, return the allowed access.
- get_destination_page_number(destination: Destination) Optional[int]
Retrieve page number of a given Destination object.
- Parameters
destination – The destination to get page number.
- Returns
The page number or None if page is not found
- get_fields(tree: Optional[TreeObject] = None, retval: Optional[Dict[Any, Any]] = None, fileobj: Optional[Any] = None) Optional[Dict[str, Any]]
Extract field data if this PDF contains interactive form fields.
The tree and retval parameters are for recursive use.
- Parameters
tree –
retval –
fileobj – A file object (usually a text file) to write a report to on all interactive form fields found.
- Returns
A dictionary where each key is a field name, and each value is a
Field
object. By default, the mapping name is used for keys.None
if form data could not be located.
- get_form_text_fields(full_qualified_name: bool = False) Dict[str, Any]
Retrieve form fields from the document with textual data.
- Parameters
full_qualified_name – to get full name
- Returns
A dictionary. The key is the name of the form field, the value is the content of the field.
If the document contains multiple form fields with the same name, the second and following will get the suffix .2, .3, …
- get_named_dest_root() ArrayObject
- get_num_pages() int
Calculate the number of pages in this PDF file.
- Returns
The number of pages of the parsed PDF file
- Raises
PdfReadError – if file is encrypted and restrictions prevent this action.
- get_page(page_number: int) PageObject
Retrieve a page by number from this PDF file. Most of the time
`.pages[page_number]`
is preferred.- Parameters
page_number – The page number to retrieve (pages begin at zero)
- Returns
A
PageObject
instance.
- get_page_number(page: PageObject) Optional[int]
Retrieve page number of a given PageObject.
- Parameters
page – The page to get page number. Should be an instance of
PageObject
- Returns
The page number or None if page is not found
- get_pages_showing_field(field: Union[Field, PdfObject, IndirectObject]) List[PageObject]
Provides list of pages where the field is called.
- Parameters
field – Field Object, PdfObject or IndirectObject referencing a Field
- Returns
List of pages –
- Empty list:
The field has no widgets attached (either hidden field or ancestor field).
- Single page list:
Page where the widget is present (most common).
- Multi-page list:
Field with multiple kids widgets (example: radio buttons, field repeated on multiple pages).
- property metadata: Optional[DocumentInformation]
Retrieve the PDF file’s document information dictionary, if it exists.
Note that some PDF files use metadata streams instead of document information dictionaries, and these metadata streams will not be accessed by this function.
- property named_destinations: Dict[str, Any]
A read-only dictionary which maps names to
Destinations
- property outline: List[Union[Destination, List[Union[Destination, List[Destination]]]]]
Read-only property for the outline present in the document.
(i.e., a collection of ‘outline items’ which are also known as ‘bookmarks’)
- property page_labels: List[str]
A list of labels for the pages in this document.
This property is read-only. The labels are in the order that the pages appear in the document.
- property pages: List[PageObject]
Property that emulates a list of
PageObject
. this property allows to get a page or a range of pages.For PdfWriter Only: It provides also capability to remove a page/range of page from the list (through del operator) Note: only the page entry is removed. As the objects beneath can be used somewhere else. A solution to completely remove them - if they are not used anywhere - is to write to a buffer/temporary file and to load it into a new PdfWriter object afterwards.
- remove_page(page: Union[int, PageObject, IndirectObject], clean: bool = False) None
Remove page from pages list.
- Parameters
page –
int / PageObject / IndirectObject PageObject : page to be removed. If the page appears many times only the first one will be removed
IndirectObject: Reference to page to be removed
int: Page number to be removed
clean – replace PageObject with NullObject to prevent destination, annotation to reference a detached page
- property user_access_permissions: Optional[UserAccessPermissions]
Get the user access permissions for encrypted documents. Returns None if not encrypted.
- property viewer_preferences: Optional[ViewerPreferences]
Returns the existing ViewerPreferences as an overloaded dictionary.
- find_outline_item(outline_item: Dict[str, Any], root: Optional[List[Union[Destination, List[Union[Destination, List[Destination]]]]]] = None) Optional[List[int]] [source]
- find_bookmark(outline_item: Dict[str, Any], root: Optional[List[Union[Destination, List[Union[Destination, List[Destination]]]]]] = None) Optional[List[int]] [source]
Deprecated since version 2.9.0: Use
find_outline_item()
instead.
- reset_translation(reader: Union[None, PdfReader, IndirectObject] = None) None [source]
Reset the translation table between reader and the writer object.
Late cloning will create new independent objects.
- Parameters
reader – PdfReader or IndirectObject referencing a PdfReader object. if set to None or omitted, all tables will be reset.
- set_page_label(page_index_from: int, page_index_to: int, style: Optional[PageLabelStyle] = None, prefix: Optional[str] = None, start: Optional[int] = 0) None [source]
Set a page label to a range of pages.
Page indexes must be given starting from 0. Labels must have a style, a prefix or both. If to a range is not assigned any page label a decimal label starting from 1 is applied.
- Parameters
page_index_from – page index of the beginning of the range starting from 0
page_index_to – page index of the beginning of the range starting from 0
style –
The numbering style to be used for the numeric portion of each page label:
/D
Decimal arabic numerals/R
Uppercase roman numerals/r
Lowercase roman numerals/A
Uppercase letters (A to Z for the first 26 pages, AA to ZZ for the next 26, and so on)/a
Lowercase letters (a to z for the first 26 pages, aa to zz for the next 26, and so on)
prefix – The label prefix for page labels in this range.
start – The value of the numeric portion for the first page label in the range. Subsequent pages are numbered sequentially from this value, which must be greater than or equal to 1. Default value: 1.
- class pypdf.ObjectDeletionFlag(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
IntFlag
- NONE = 0
- TEXT = 1
- LINKS = 2
- ATTACHMENTS = 4
- OBJECTS_3D = 8
- ALL_ANNOTATIONS = 16
- XOBJECT_IMAGES = 32
- INLINE_IMAGES = 64
- DRAWING_IMAGES = 128
- IMAGES = 224