pypdf is a library and hence its users are developers. This document is not for the users, but for people who want to work on pypdf itself.
pip install -r requirements/dev.txt
The sample-files git submodule
The reason for having the submodule
sample-files is that we want to keep
the size of the pypdf repository small while we also want to have an extensive
test suite. Those two goals contradict each other.
resources folder should contain a select set of core examples that cover
most cases we typically want to test for. The
sample-files might cover a lot
more edge cases, the behavior we get when file sizes get bigger, different
In order to get the sample-files folder, you need to execute:
git submodule update --init
Tools: git and pre-commit
Git is a command line application for version control. If you don’t know it, you can play ohmygit to learn it.
GitHub is the service where the pypdf project is hosted. While git is free and open source, GitHub is a paid service by Microsoft - but for free in lot of cases.
pre-commit is a command line application
that uses git hooks to automatically execute code. This allows you to avoid
style issues and other code quality issues. After you entered
once in your local copy of pypdf, it will automatically be executed when
Having a clean commit message helps people to quickly understand what the commit was about, without actually looking at the changes. The first line of the commit message is used to auto-generate the CHANGELOG. For this reason, the format should be:
PREFIX can be:
BUG: A bug was fixed. Likely there is one or multiple issues. Then write in the
Closes #123where 123 is the issue number on GitHub. It would be absolutely amazing if you could write a regression test in those cases. That is a test that would fail without the fix.
ENH: A new feature! Describe in the body what it can be used for.
DEP: A deprecation - either marking something as “this is going to be removed” or actually removing it.
PI: A performance improvement. This could also be a reduction in the file size of PDF files generated by pypdf.
ROB: A robustness change. Dealing better with broken PDF files.
DOC: A documentation change.
TST: Adding / adjusting tests.
DEV: Developer experience improvements - e.g. pre-commit or setting up CI
MAINT: Quite a lot of different stuff. Performance improvements are for sure the most interesting changes in here. Refactorings as well.
STY: A style change. Something that makes pypdf code more consistent. Typically a small change.
We need to keep an eye on performance and thus we have a few benchmarks.