Working with pages
This section details with how to view and edit the contents of a page.
pikepdf is not an ideal tool for producing new PDFs from scratch – and there are many good tools for that, as mentioned elsewhere. pikepdf is better at inspecting, editing and transforming existing PDFs.
pikepdf.Page
objects can be thought of a subclass of pikepdf.Dictionary
. Since
pages are important, they are special objects, and the Pdf.pages
API will only
accept or return pikepdf.Page.
>>> from pikepdf import Pdf, Page
>>> example = Pdf.open('../tests/resources/congress.pdf')
>>> page1 = example.pages[0]
>>> page1
<pikepdf.Page({
"/Contents": pikepdf.Stream(owner=<...>, data=b'q\n200.0000 0 0 304.0'..., {
"/Length": 50
}),
"/MediaBox": [ 0, 0, 200, 304 ],
"/Parent": <reference to /Pages>,
"/Resources": {
"/XObject": {
"/Im0": pikepdf.Stream(owner=<...>, data=<...>, {
"/BitsPerComponent": 8,
"/ColorSpace": "/DeviceRGB",
"/Filter": [ "/DCTDecode" ],
"/Height": 1520,
"/Length": 192956,
"/Subtype": "/Image",
"/Type": "/XObject",
"/Width": 1000
})
}
},
"/Type": "/Page"
})>
The page’s /Contents
key contains instructions for drawing the page content.
This is a content stream, which is a stream object
that follows special rules.
Also attached to this page is a /Resources
dictionary, which contains a
single XObject image. The image is compressed with the /DCTDecode
filter,
meaning it is encoded with the DCT, so it is
a JPEG. pikepdf has special APIs for working with images.
The /MediaBox
describes the bounding box of the page in PDF pt units
(1/72” or 0.35 mm).
You can access the page dictionary data structure directly, but it’s fairly
complicated. There are a number of rules, optional values and implied values.
To do so, you would access the page1.obj
property, which returns the
underlying dictionary object that holds the page data.
Changed in version 9.0: The Pdf.pages
API was made strict, and now accepts only pikepdf.Page
for its various functions. In most cases, if you intend to create a
Dictionary and use it as a page, all you need to do is be explicit:
`pikepdf.Page(pikepdf.Dictionary(Type=Name.Page))
Changed in version 8.x: The use of Python dictionary or pikepdf.Dictionary to represent pages was deprecated.
Changed in version 2.x: In pikepdf 2.x, the raw dictionary object was returned, and it was
necessary to manually wrap it with the support model:
page = Page(pdf.pages[0])
. This is no longer necessary.
Page boxes
>>> page1.trimbox
pikepdf.Array([ 0, 0, 200, 304 ])
Page
will resolve implicit information. For example, page.trimbox
will return an appropriate trim box for this page, which in this case is
equal to the media box. This happens even if the page does not define
a trim box.