History of pypdf
The Origins: pyPdf (2005-2010)
In 2005, Mathieu Fenniak launched pyPdf “as a PDF toolkit…” focused on
document manipulation: by-page splitting, concatenation, and merging;
page cropping; and
document encryption and decryption.
The last release of PyPI was pyPdf 1.13 in 2010.
PyPDF2 is born (2011-2016)
At the end of 2011, after consultation with Mathieu and others, Phaseit sponsored PyPDF2 as a fork of pyPdf on GitHub. The initial impetus was to handle a wider range of input PDF instances; Phaseit’s commercial work often encounters PDF instances “in the wild” that it needs to manage (mostly concatenate and paginate), but that deviate so much from PDF standards that pyPdf can’t read them. PyPDF2 reads a considerably wider range of real-world PDF instances.
Neither pyPdf nor PyPDF2 aims to be universal, that is, to provide all possible PDF-related functionality. Note that the similar-appearing pyfpdf of Mariano Reingart is most comparable to ReportLab, in that both ReportLab and pyfpdf emphasize document generation. Interestingly enough, pyfpdf builds in a basic HTML→PDF converter while PyPDF2 has no knowledge of HTML.
So what is PyPDF2 truly about? Think about popular pdftk for a moment. PyPDF2 does what pdftk does, and it does so within your current Python process, and it handles a wider range of variant PDF formats [explain]. PyPDF2 has its own FAQ to answer other questions that have arisen.
The Reddit /r/python crowd chatted obliquely and briefly about PyPDF2 in March 2012.
The core developer / maintainer was Matthew Stamy.
PyPDF3 and PyPDF4 (2018 - 2022)
Two approaches were made to get PyPDF2 active again: PyPDF3 and PyPDF4.
PyPDF3 had it’s first release in 2018 and its last one in February 2022. It never got the user base from PyPDF2.
PyPDF4 only had one release in 2018.
PyPDF2: Reborn (2022)
Martin Thoma took over maintenance of PyPDF2 in April 2022. It had over 100 open PRs and 321 open issues.
pubpub-zz was extremely active, especially for text extraction.
Matthew Peveler helped a lot with reviews and general project decisions.
exiledkingcc added support for modern encryption schemes.
pypdf: Back to the Roots (2023-Today)
In order to make things simpler for beginners, PyPDF2 was merged back into pypdf. Now all lowercase, without a number. We hope that the folks who develop PyPDF3 and PyPDF4 also join us.
PyPDF2 >= 3.0.0,
pypdf >= 3.1.0 now offers:
AES reading and writing support. Not only with PyCryptoDome, but also with cryptography.
Text extraction improvements, e.g. for math content. pypdf is now comparable with Tika, pypdfium2, and PyMuPDF
Performance Improvements and Bugfixes
Page Label support