PyMuPDF coordinate system(s) #1806
-
I'm looking to get a handle on the coordinate system(s) used in pymupdf. I know that for any given page, the origin is supposed to be the "top left", and that positive x -> right and positive y -> down, and that this comes from mupdf. However, I'm not clear on anything more specific than that. Is the origin the top left corner of the page, the mediabox, the top left corner of the cropbox? Something else? The documentation says: "The MediaBox is the only rectangle, for which there is no difference between MuPDF and PDF coordinate systems: Page.mediabox will always show the same coordinates as the /MediaBox key in a page’s object definition. For all other rectangles, MuPDF transforms coordinates such that the top-left corner is the point of reference." This was helpful to me because I'm familiar with PDF coordinate systems, but it doesn't say how, for example, the coordinates returned by Page.get_text(“words”) are related to the coordinates returned by page.get_pixmap().irect. My first thought, based on the documentation's example (just below the quote above) was that both of these, as well as all other rect-like coordinates, are indeed in the same coordinate system, and that this coordinate system is that which has an origin at the top-left of the mediabox, has positive x -> right and positive y -> down, and has scale that matches PDF's default user space coordinate system. Is this the case? On a related note, what does page.get_pixmap() actually return a pixmap of? Is is the mediabox, cropbox, or something else? Finally, I noticed the links in the pymupdf documentation for the Adobe PDF reference are broken... I found a document at https://opensource.adobe.com/dc-acrobat-sdk-docs/standards/pdfstandards/pdf/PDF32000_2008.pdf, but maybe this isn't the place to mention that. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
In (Py-) MuPDF, The coordinates of If you can find the new location of that PDF reference, please submit a PR. |
Beta Was this translation helpful? Give feedback.
-
No, there only is
No! At best you could say, they are relative to the cropbox if that would have (0,0) as top-left.
There is no exception. Pixmap is no exception either: it contains no coordinates. Like neither Methods and attributes refer to the unrotated page - for input and output: if you insert something (text, drawings, images, annotations, ...) you must use unrotated coordinates.
The latter two are the inverse of each other and transform coordinates from/to the rotated page. They are the In [1]: import fitz
In [2]: doc=fitz.open()
In [3]: page=doc.new_page()
In [4]: page.transformation_matrix
Out[4]: Matrix(1.0, 0.0, 0.0, -1.0, 0.0, 842.0)
In [5]: page.rotation_matrix
Out[5]: Matrix(1.0, 0.0, 0.0, 1.0, 0.0, 0.0)
In [6]: page.derotation_matrix
Out[6]: Matrix(1.0, -0.0, -0.0, 1.0, 0.0, 0.0)
In [7]: page.set_rotation(90)
In [8]: page.rotation_matrix
Out[8]: Matrix(0.0, 1.0, -1.0, 0.0, 842.0, 0.0)
In [9]: page.derotation_matrix
Out[9]: Matrix(0.0, -1.0, 1.0, 0.0, -0.0, 842.0) |
Beta Was this translation helpful? Give feedback.
-
CropBox:
Yes, but the top-left corner of the pixmap, nothing else. The values have nothing to do with any document page. Nothing at all. If a pixmap is made from a document page with default parameters, its A pixmap even need not stem from a document page as far as that is concerned. Its x/y values can be used to position image content when copying beween pixmaps. You can also arbitrarily modify them - and you would in the mentioned case.
|
Beta Was this translation helpful? Give feedback.
-
Thanks for mentioning this bad link, your suggested URL looks good. I've updated my tree so this fix will be in the next release. |
Beta Was this translation helpful? Give feedback.
CropBox:
A different coordinate system does not mean
page.cropbox
is a different thing from/CropBox
. Only its values may be different from those found under/CropBox
- because of the MuPDF coordinate system. That's all.So inside PyMuPDF, the
/CropBox
array is taken, thenpage.transformation_matrix
is applied to its values to yield the propertypage.cropbox
.Now clear?
Yes, but the top-left corner of the pixmap, nothing else. The values have nothing to do with any document page. Nothing at all. If a pixmap is made from a document page with default parameters, its
irect
equalspage.rect.irect
. This is where their relationship…