Migration Guide: 1.x to 2.x
PyPDF2<2.0.0
(docs)
is very different from PyPDF2>=2.0.0
(docs).
Luckily, most changes are simple naming adjustments. This guide helps you to
make the step from PyPDF2 1.x
(or even the original PyPpdf) to PyPDF2>=2.0.0
.
You can execute your code with the updated version and show deprecation warnings
by running python -W all your_code.py
.
Imports and Modules
PyPDF2.utils
no longer existsPyPDF2.pdf
no longer exists. You can import fromPyPDF2
directly or fromPyPDF2.generic
Naming Adjustments
Classes
The base classes were renamed as they also allow to operate with ByteIO streams
instead of files. Also, the strict
parameter changed the default value from
strict=True
to strict=False
.
PdfFileReader
➔PdfReader
PdfFileWriter
➔PdfWriter
PdfFileMerger
➔PdfMerger
PdfFileReader and PdfFileMerger no longer have the overwriteWarnings
parameter. The new behavior is overwriteWarnings=False
.
Function, Method, and Property Names
In PyPDF2.xmp.XmpInformation
:
rdfRoot
➔rdf_root
xmp_createDate
➔xmp_create_date
xmp_creatorTool
➔xmp_creator_tool
xmp_metadataDate
➔xmp_metadata_date
xmp_modifyDate
➔xmp_modify_date
xmpMetadata
➔xmp_metadata
xmpmm_documentId
➔xmpmm_document_id
xmpmm_instanceId
➔xmpmm_instance_id
In PyPDF2.generic
:
readObject
➔read_object
convertToInt
➔convert_to_int
DocumentInformation.getText
➔DocumentInformation._get_text
: This method should typically not be used; please let me know if you need it.readHexStringFromStream
➔read_hex_string_from_stream
initializeFromDictionary
➔initialize_from_dictionary
createStringObject
➔create_string_object
TreeObject.hasChildren
➔TreeObject.has_children
TreeObject.emptyTree
➔TreeObject.empty_tree
In many places:
getObject
➔get_object
writeToStream
➔write_to_stream
readFromStream
➔read_from_stream
PdfReader class:
reader.getPage(pageNumber)
➔reader.pages[page_number]
reader.getNumPages()
/reader.numPages
➔len(reader.pages)
getDocumentInfo
➔metadata
flattenedPages
attribute ➔flattened_pages
resolvedObjects
attribute ➔resolved_objects
xrefIndex
attribute ➔xref_index
getNamedDestinations
/namedDestinations
attribute ➔named_destinations
getPageLayout
/pageLayout
➔page_layout
attributegetPageMode
/pageMode
➔page_mode
attributegetIsEncrypted
/isEncrypted
➔is_encrypted
attributegetOutlines
➔get_outlines
readObjectHeader
➔read_object_header
cacheGetIndirectObject
➔cache_get_indirect_object
cacheIndirectObject
➔cache_indirect_object
getDestinationPageNumber
➔get_destination_page_number
readNextEndLine
➔read_next_end_line
_zeroXref
➔_zero_xref
_authenticateUserPassword
➔_authenticate_user_password
_pageId2Num
attribute ➔_page_id2num
_buildDestination
➔_build_destination
_buildOutline
➔_build_outline
_getPageNumberByIndirect(indirectRef)
➔_get_page_number_by_indirect(indirect_ref)
_getObjectFromStream
➔_get_object_from_stream
_decryptObject
➔_decrypt_object
_flatten(..., indirectRef)
➔_flatten(..., indirect_ref)
_buildField
➔_build_field
_checkKids
➔_check_kids
_writeField
➔_write_field
_write_field(..., fieldAttributes)
➔_write_field(..., field_attributes)
_read_xref_subsections(..., getEntry, ...)
➔_read_xref_subsections(..., get_entry, ...)
PdfWriter class:
writer.getPage(pageNumber)
➔writer.pages[page_number]
writer.getNumPages()
➔len(writer.pages)
addMetadata
➔add_metadata
addPage
➔add_page
addBlankPage
➔add_blank_page
addAttachment(fname, fdata)
➔add_attachment(filename, data)
insertPage
➔insert_page
insertBlankPage
➔insert_blank_page
appendPagesFromReader
➔append_pages_from_reader
updatePageFormFieldValues
➔update_page_form_field_values
cloneReaderDocumentRoot
➔clone_reader_document_root
cloneDocumentFromReader
➔clone_document_from_reader
getReference
➔get_reference
getOutlineRoot
➔get_outline_root
getNamedDestRoot
➔get_named_dest_root
addBookmarkDestination
➔add_bookmark_destination
addBookmarkDict
➔add_bookmark_dict
addBookmark
➔add_bookmark
addNamedDestinationObject
➔add_named_destination_object
addNamedDestination
➔add_named_destination
removeLinks
➔remove_links
removeImages(ignoreByteStringObject)
➔remove_images(ignore_byte_string_object)
removeText(ignoreByteStringObject)
➔remove_text(ignore_byte_string_object)
addURI
➔add_uri
addLink
➔add_link
getPage(pageNumber)
➔get_page(page_number)
getPageLayout / setPageLayout / pageLayout
➔page_layout attribute
getPageMode / setPageMode / pageMode
➔page_mode attribute
_addObject
➔_add_object
_addPage
➔_add_page
_sweepIndirectReferences
➔_sweep_indirect_references
PdfMerger class
__init__
parameter:strict=True
➔strict=False
(thePdfFileMerger
still has the old default)addMetadata
➔add_metadata
addNamedDestination
➔add_named_destination
setPageLayout
➔set_page_layout
setPageMode
➔set_page_mode
Page class:
artBox
/bleedBox
/cropBox
/mediaBox
/trimBox
➔artbox
/bleedbox
/cropbox
/mediabox
/trimbox
getWidth
,getHeight
➔width
/height
getLowerLeft_x
/getUpperLeft_x
➔left
getUpperRight_x
/getLowerRight_x
➔right
getLowerLeft_y
/getLowerRight_y
➔bottom
getUpperRight_y
/getUpperLeft_y
➔top
getLowerLeft
/setLowerLeft
➔lower_left
propertyupperRight
➔upper_right
mergePage
➔merge_page
rotateClockwise
/rotateCounterClockwise
➔rotate_clockwise
_mergeResources
➔_merge_resources
_contentStreamRename
➔_content_stream_rename
_pushPopGS
➔_push_pop_gs
_addTransformationMatrix
➔_add_transformation_matrix
_mergePage
➔_merge_page
XmpInformation class:
getElement(..., aboutUri, ...)
➔get_element(..., about_uri, ...)
getNodesInNamespace(..., aboutUri, ...)
➔get_nodes_in_namespace(..., aboutUri, ...)
_getText
➔_get_text
utils.py:
matrixMultiply
➔ `matrix_multiplyRC4_encrypt
is moved to the security module
Parameter Names
PdfWriter.get_page
:pageNumber
➔page_number
PyPDF2.filters
(all classes):decodeParms
➔decode_parms
PyPDF2.filters
(all classes):decodeStreamData
➔decode_stream_data
pagenum
➔page_number
PdfMerger.merge
:position
➔page_number
PdfWriter.add_outline_item_destination
:dest
➔page_destination
PdfWriter.add_named_destination_object
:dest
➔page_destination
PdfWriter.encrypt
:user_pwd
➔user_password
PdfWriter.encrypt
:owner_pwd
➔owner_password
Deprecations
A few classes / functions were deprecated without replacement:
PyPDF2.utils.ConvertFunctionsToVirtualList
PyPDF2.utils.formatWarning
PyPDF2.isInt(obj)
: Useinstance(obj, int)
insteadPyPDF2.u_(s)
: Uses
directlyPyPDF2.chr_(c)
: Usechr(c)
insteadPyPDF2.barray(b)
: Usebytearray(b)
insteadPyPDF2.isBytes(b)
: Useinstance(b, type(bytes()))
insteadPyPDF2.xrange_fn
: Userange
insteadPyPDF2.string_type
: Usestr
insteadPyPDF2.isString(s)
: Useinstance(s, str)
insteadPyPDF2._basestring
: Usestr
insteadb_(...)
was removed. You should typically be able use the bytes object directly, otherwise you can copy this