The PdfReader Class
- class pypdf.PdfReader(stream: Union[str, IO, Path], strict: bool = False, password: Union[None, str, bytes] = None)[source]
Bases:
object
Initialize a PdfReader object.
This operation can take some time, as the PDF stream’s cross-reference tables are read into memory.
- Parameters
stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file.
strict – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to
False
.password – Decrypt PDF file at initialization. If the password is None, the file will not be decrypted. Defaults to
None
- property viewer_preferences: Optional[pypdf.generic._viewerpref.ViewerPreferences]
Returns the existing ViewerPreferences as an overloaded dictionary.
- property pdf_header: str
The first 8 bytes of the file.
This is typically something like
'%PDF-1.6'
and can be used to detect if the file is actually a PDF file and which version it is.
- property metadata: Optional[DocumentInformation]
Retrieve the PDF file’s document information dictionary, if it exists.
Note that some PDF files use metadata streams instead of docinfo dictionaries, and these metadata streams will not be accessed by this function.
- getDocumentInfo() Optional[DocumentInformation] [source]
Use the attribute
metadata
instead.Deprecated since version 1.28.0.
- property documentInfo: Optional[DocumentInformation]
Use the attribute
metadata
instead.Deprecated since version 1.28.0.
- property xmp_metadata: Optional[XmpInformation]
XMP (Extensible Metadata Platform) data.
- getXmpMetadata() Optional[XmpInformation] [source]
Use the attribute
metadata
instead.Deprecated since version 1.28.0.
- property xmpMetadata: Optional[XmpInformation]
Use the attribute
xmp_metadata
instead.Deprecated since version 1.28.0..
- getPage(pageNumber: int) PageObject [source]
Use
reader.pages[page_number]
instead.Deprecated since version 1.28.0.
- property namedDestinations: Dict[str, Any]
Use
named_destinations
instead.Deprecated since version 1.28.0.
- property named_destinations: Dict[str, Any]
A read-only dictionary which maps names to
Destinations
- get_fields(tree: Optional[TreeObject] = None, retval: Optional[Dict[Any, Any]] = None, fileobj: Optional[Any] = None) Optional[Dict[str, Any]] [source]
Extract field data if this PDF contains interactive form fields.
The tree and retval parameters are for recursive use.
- Parameters
tree –
retval –
fileobj – A file object (usually a text file) to write a report to on all interactive form fields found.
- Returns
A dictionary where each key is a field name, and each value is a
Field
object. By default, the mapping name is used for keys.None
if form data could not be located.
- getFields(tree: Optional[TreeObject] = None, retval: Optional[Dict[Any, Any]] = None, fileobj: Optional[Any] = None) Optional[Dict[str, Any]] [source]
Use
get_fields()
instead.Deprecated since version 1.28.0.
- get_form_text_fields(full_qualified_name: bool = False) Dict[str, Any] [source]
Retrieve form fields from the document with textual data.
- Parameters
full_qualified_name – to get full name
- Returns
A dictionary. The key is the name of the form field, the value is the content of the field.
If the document contains multiple form fields with the same name, the second and following will get the suffix .2, .3, …
- getFormTextFields() Dict[str, Any] [source]
Use
get_form_text_fields()
instead.Deprecated since version 1.28.0.
- getNamedDestinations(tree: Optional[TreeObject] = None, retval: Optional[Any] = None) Dict[str, Any] [source]
Use
named_destinations
instead.Deprecated since version 1.28.0.
- property outline: List[Union[Destination, List[Union[Destination, List[Destination]]]]]
Read-only property for the outline present in the document.
(i.e., a collection of ‘outline items’ which are also known as ‘bookmarks’)
- property outlines: List[Union[Destination, List[Union[Destination, List[Destination]]]]]
Use
outline
instead.Deprecated since version 2.9.0.
- getOutlines(node: Optional[DictionaryObject] = None, outline: Optional[Any] = None) List[Union[Destination, List[Union[Destination, List[Destination]]]]] [source]
Use
outline
instead.Deprecated since version 1.28.0.
- property threads: Optional[pypdf.generic._data_structures.ArrayObject]
Read-only property for the list of threads.
See §8.3.2 from PDF 1.7 spec.
It’s an array of dictionaries with “/F” and “/I” properties or None if there are no articles.
- get_page_number(page: PageObject) int [source]
Retrieve page number of a given PageObject.
- Parameters
page – The page to get page number. Should be an instance of
PageObject
- Returns
The page number or -1 if page is not found
- getPageNumber(page: PageObject) int [source]
Use
get_page_number()
instead.Deprecated since version 1.28.0.
- get_destination_page_number(destination: Destination) int [source]
Retrieve page number of a given Destination object.
- Parameters
destination – The destination to get page number.
- Returns
The page number or -1 if page is not found
- getDestinationPageNumber(destination: Destination) int [source]
Use
get_destination_page_number()
instead.Deprecated since version 1.28.0.
- property pages: List[PageObject]
Read-only property that emulates a list of
Page
objects.
- property page_labels: List[str]
A list of labels for the pages in this document.
This property is read-only. The labels are in the order that the pages appear in the document.
- property page_layout: Optional[str]
Get the page layout currently being used.
Valid layout
values/NoLayout
Layout explicitly not specified
/SinglePage
Show one page at a time
/OneColumn
Show one column at a time
/TwoColumnLeft
Show pages in two columns, odd-numbered pages on the left
/TwoColumnRight
Show pages in two columns, odd-numbered pages on the right
/TwoPageLeft
Show two pages at a time, odd-numbered pages on the left
/TwoPageRight
Show two pages at a time, odd-numbered pages on the right
- getPageLayout() Optional[str] [source]
Use
page_layout
instead.Deprecated since version 1.28.0.
- property pageLayout: Optional[str]
Use
page_layout
instead.Deprecated since version 1.28.0.
- property page_mode: Optional[Literal['/UseNone', '/UseOutlines', '/UseThumbs', '/FullScreen', '/UseOC', '/UseAttachments']]
Get the page mode currently being used.
Valid mode
values/UseNone
Do not show outline or thumbnails panels
/UseOutlines
Show outline (aka bookmarks) panel
/UseThumbs
Show page thumbnails panel
/FullScreen
Fullscreen view
/UseOC
Show Optional Content Group (OCG) panel
/UseAttachments
Show attachments panel
- getPageMode() Optional[Literal['/UseNone', '/UseOutlines', '/UseThumbs', '/FullScreen', '/UseOC', '/UseAttachments']] [source]
Use
page_mode
instead.Deprecated since version 1.28.0.
- property pageMode: Optional[Literal['/UseNone', '/UseOutlines', '/UseThumbs', '/FullScreen', '/UseOC', '/UseAttachments']]
Use
page_mode
instead.Deprecated since version 1.28.0.
- getObject(indirectReference: IndirectObject) Optional[PdfObject] [source]
Use
get_object()
instead.Deprecated since version 1.28.0.
- readObjectHeader(stream: IO) Tuple[int, int] [source]
Use
read_object_header()
instead.Deprecated since version 1.28.0.
- cacheGetIndirectObject(generation: int, idnum: int) Optional[PdfObject] [source]
Use
cache_get_indirect_object()
instead.Deprecated since version 1.28.0.
- cache_indirect_object(generation: int, idnum: int, obj: Optional[PdfObject]) Optional[PdfObject] [source]
- cacheIndirectObject(generation: int, idnum: int, obj: Optional[PdfObject]) Optional[PdfObject] [source]
Use
cache_indirect_object()
instead.Deprecated since version 1.28.0.
- read_next_end_line(stream: IO, limit_offset: int = 0) bytes [source]
Deprecated since version 2.1.0.
- decrypt(password: Union[str, bytes]) PasswordType [source]
When using an encrypted / secured PDF file with the PDF Standard encryption handler, this function will allow the file to be decrypted. It checks the given password against the document’s user password and owner password, and then stores the resulting decryption key if either password is correct.
It does not matter which password was matched. Both passwords provide the correct decryption key that will allow the document to be used with this library.
- Parameters
password – The password to match.
- Returns
An indicator if the document was decrypted and weather it was the owner password or the user password.
- property is_encrypted: bool
Read-only boolean property showing whether this PDF file is encrypted.
Note that this property, if true, will remain true even after the
decrypt()
method is called.
- getIsEncrypted() bool [source]
Use
is_encrypted
instead.Deprecated since version 1.28.0.
- property isEncrypted: bool
Use
is_encrypted
instead.Deprecated since version 1.28.0.
- add_form_topname(name: str) Optional[DictionaryObject] [source]
Add a top level form that groups all form fields below it.
- Parameters
name – text string of the “/T” Attribute of the created object
- Returns
The created object.
None
means no object was created.