The DocumentInformation Class

class pypdf.DocumentInformation[source]

Bases: DictionaryObject

A class representing the basic document metadata provided in a PDF File. This class is accessible through PdfReader.metadata.

All text properties of the document metadata have two properties, e.g. author and author_raw. The non-raw property will always return a TextStringObject, making it ideal for a case where the metadata is being displayed. The raw property can sometimes return a ByteStringObject, if pypdf was unable to decode the string’s text encoding; this requires additional safety in the caller and therefore is not as commonly accessed.

property title: str | None

Read-only property accessing the document’s title.

Returns a TextStringObject or None if the title is not specified.

property title_raw: str | None

The “raw” version of title; can return a ByteStringObject.

property author: str | None

Read-only property accessing the document’s author.

Returns a TextStringObject or None if the author is not specified.

property author_raw: str | None

The “raw” version of author; can return a ByteStringObject.

property subject: str | None

Read-only property accessing the document’s subject.

Returns a TextStringObject or None if the subject is not specified.

property subject_raw: str | None

The “raw” version of subject; can return a ByteStringObject.

property creator: str | None

Read-only property accessing the document’s creator.

If the document was converted to PDF from another format, this is the name of the application (e.g. OpenOffice) that created the original document from which it was converted. Returns a TextStringObject or None if the creator is not specified.

property creator_raw: str | None

The “raw” version of creator; can return a ByteStringObject.

property producer: str | None

Read-only property accessing the document’s producer.

If the document was converted to PDF from another format, this is the name of the application (for example, macOS Quartz) that converted it to PDF. Returns a TextStringObject or None if the producer is not specified.

property producer_raw: str | None

The “raw” version of producer; can return a ByteStringObject.

property creation_date: datetime | None

Read-only property accessing the document’s creation date.

property creation_date_raw: str | None

The “raw” version of creation date; can return a ByteStringObject.

Typically in the format D:YYYYMMDDhhmmss[+Z-]hh'mm where the suffix is the offset from UTC.

property modification_date: datetime | None

Read-only property accessing the document’s modification date.

The date and time the document was most recently modified.

property modification_date_raw: str | None

The “raw” version of modification date; can return a ByteStringObject.

Typically in the format D:YYYYMMDDhhmmss[+Z-]hh'mm where the suffix is the offset from UTC.

property keywords: str | None

Read-only property accessing the document’s keywords.

Returns a TextStringObject or None if keywords are not specified.

property keywords_raw: str | None

The “raw” version of keywords; can return a ByteStringObject.