Welcome to pypdf
pypdf is a free and open source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.
See pdfly for a CLI application that uses pypdf to interact with PDFs.
You can contribute to pypdf on GitHub.
User Guide
- Installation
- Migration Guide: 1.x to 2.x
- Imports and Modules
- Naming Adjustments
- Robustness and strict=False
- Exceptions, Warnings, and Log messages
- Metadata
- Extract Text from a PDF
- Post-Processing in Text Extraction
- Extract Images
- Extract Attachments
- Encryption and Decryption of PDFs
- Merging PDF files
- Cropping and Transforming PDFs
- Transforming several copies of the same page
- Adding a Stamp/Watermark to a PDF
- Reading PDF Annotations
- Adding PDF Annotations
- Interactions with PDF Forms
- Streaming Data with pypdf
- Reduce PDF File Size
- PDF Version Support
- PDF/A Compliance
API Reference
- The PdfReader Class
- The PdfWriter Class
- The PdfMerger Class
- The PageObject Class
- The Transformation Class
- The DocumentInformation Class
- The XmpInformation Class
- The Destination Class
- The RectangleObject Class
- The Field Class
- The PageRange Class
- The annotations module
- The Fit Class
- The PaperSize Class
Developer Guide