Welcome to pypdf
pypdf is a free and open source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.
See pdfly for a CLI application that uses pypdf to interact with PDFs.
You can contribute to pypdf on GitHub.
- Installation
- Migration Guide: 1.x to 2.x
- Imports and Modules
- Naming Adjustments
- Robustness and strict=False
- Exceptions, Warnings, and Log messages
- Metadata
- Extract Text from a PDF
- Post-Processing of Text Extraction
- Extract Images
- Other images
- Extract Attachments
- Encryption and Decryption of PDFs
- Merging PDF files
- Cropping and Transforming PDFs
- Transforming several copies of the same page
- Reading PDF Annotations
- Adding PDF Annotations
- Adding a Stamp or Watermark to a PDF
- Adding JavaScript to a PDF
- Adding Viewer Preferences
- Interactions with PDF Forms
- Streaming Data with pypdf
- Reduce PDF File Size
- PDF Version Support
- PDF/A Compliance
- The PdfReader Class
- The PdfWriter Class
- The Destination Class
- The DocumentInformation Class
- The Field Class
- The Fit Class
- The PageObject Class
- The PageRange Class
- The PaperSize Class
- The RectangleObject Class
- The Transformation Class
- The XmpInformation Class
- The annotations module
- Constants
- Errors
- Generic PDF objects
- The PdfDocCommon Class