Releases: py-pdf/pypdf
Releases · py-pdf/pypdf
Version 5.9.0, 2025-07-27
What's new
New Features (ENH)
- Automatically preserve links in added pages (#3298) by @larsga
- Allow writing/updating all properties of an embedded file (#3374) by @Arya-A-Nair
Bug Fixes (BUG)
- Fix XMP handling dropping indirect references (#3392) by @stefan6419846
Robustness (ROB)
- Deal with DecodeParms being empty list (#3388) by @stefan6419846
Documentation (DOC)
- Document how to read and modify XMP metadata (#3383) by @stefan6419846
Version 5.8.0, 2025-07-13
What's new
New Features (ENH)
Bug Fixes (BUG)
Robustness (ROB)
- Resolve some image extraction edge cases (#3371) by @stefan6419846
- Ignore faulty trailing newline during RLE decoding (#3355) by @henningkoertelgmg
- Gracefully handle odd-length strings in parse_bfchar (#3348) by @stefan6419846
Developer Experience (DEV)
- Modernize license specifiers (#3338) by @stefan6419846
Maintenance (MAINT)
- Reduce max-complexity of tool.ruff.lint.mccabe (#3365) by @j-t-1
- Refactor text extraction code by @MartinThoma
Version 5.7.0, 2025-06-29
What's new
Performance Improvements (PI)
- Performance optimization for LZW decoding (#3329) by @henningkoertelgmg
Robustness (ROB)
- Flate decoding for streams with faulty tail bytes (#3332) by @henningkoertelgmg
- dc_creator could be a Bag as well (#3333) by @stefan6419846
- Handle tree being NullObject when retrieving named destinations (#3331) by @stefan6419846
Maintenance (MAINT)
- Move inline-image mappings to constants (#3328) by @stefan6419846
Version 5.6.1, 2025-06-22
What's new
New Features (ENH)
- Add PDF/A XMP metadata support (#3314) by @Arya-A-Nair
Robustness (ROB)
- Deal with annotations not being lists on merge (#3321) by @stefan6419846
- Handle NullObject for cmap encoding Differences entry (#3317) by @stefan6419846
Developer Experience (DEV)
- Update ruff to 0.12.0 (#3316) by @stefan6419846
Version 5.6.0, 2025-06-01
What's new
New Features (ENH)
- Add basic support for JBIG2 by using jbig2dec (#3163) by @stefan6419846
Bug Fixes (BUG)
- Fix crashes by removing unnecessary line (#3293) by @larsga
- Add delimiters to NameObject.renumber_table (#3286) by @ztravis
Robustness (ROB)
- Handle DecodeParms being a NullObject (#3285) by @stefan6419846
Code Style (STY)
- Update to mypy 1.16.0 (#3300) by @stefan6419846
Version 5.5.0, 2025-05-11
What's new
New Features (ENH)
- Add support for IndirectObject.iter (#3228) by @bryan-brancotte
- Allow filtering by font when removing text (#3216) by @samuelbradshaw
Bug Fixes (BUG)
- Add missing named destinations being ByteStringObjects (#3282) by @stefan6419846
- Get font information more reliably when removing text (#3252) by @samuelbradshaw
- T* 2D Translation consistent with PDF 1.7 Spec (#3250) by @hackowitz-af
- Add font stack to q/Q operations in layout mode (#3225) by @hackowitz-af
- Avoid completely hiding image loading issues like exceeding image size limits (#3221) by @stefan6419846
- Using compress_identical_objects on transformed content duplicates differing content (#3197) by @danio
- Consider BlackIs1 parameter for CCITTFaxDecode filter (#3196) by @stefan6419846
Robustness (ROB)
- Deal with insufficient cm matrix during text extraction (#3283) by @stefan6419846
- Allow merging when annotations miss D entry (#3281) by @stefan6419846
- Fix merging documents if there are no Dests (#3280) by @stefan6419846
- Fix crash on malformed action in outline (#3278) by @larsga
- Fix compression issues for removed images which might be None (#3246) by @stefan6419846
- Attempt to deal with non-rectangular FlateDecode streams (#3245) by @stefan6419846
- Handle some None values for broken PDF files (#3230) by @stefan6419846
Developer Experience (DEV)
- Multiple style improvements by @j-t-1
- Update ruff to 0.11.0 by @stefan6419846
Maintenance (MAINT)
- Conform ASCIIHexDecode implementation to specification (#3274) by @j-t-1
- Modify comments of filters that do not use decode_parms (#3260) by @j-t-1
Code Style (STY)
- Simplify warnings & debugging in layout mode text extraction (#3271) by @hackowitz-af
- Standardize mypy assert statements (#3276) by @j-t-1
Version 5.4.0, 2025-03-16
What's new
New Features (ENH)
Bug Fixes (BUG)
- Fix detection of inline images followed by names or numbers (#3173) by @stefan6419846
Robustness (ROB)
- Consider root objects without catalog type as fallback (#3175) by @stefan6419846
- Raise proper error on infinite loop when reading objects (#3169) by @stefan6419846
Documentation (DOC)
- Mention memory consumption of text extraction (#3168) by @stefan6419846
Developer Experience (DEV)
- Upgrade to ruff 0.10.0 (#3191) by @stefan6419846
Version 5.3.1, 2025-03-02
What's new
Bug Fixes (BUG)
- Use the correct name StandardEncoding for the predefined cmap (#3156) by @stefan6419846
- Handle inline images containing
EI
sequences (#3152) by @stefan6419846 - Fix check box value which should be name object (#3124) by @stefan6419846
- Fix stream position on inline image fallback extraction (#3120) by @stefan6419846
- Fix object count for incremental writer (#3117) by @m32
Robustness (ROB)
- Avoid index errors on empty lines in xref table (#3162) by @stefan6419846
- Improve handling of LZW decoder table overflow (#3159) by @stefan6419846
- Ignore non-numbers for width when building font width map (#3158) by @stefan6419846
- Avoid negative seek values when reading partially broken files (#3157) by @stefan6419846
Documentation (DOC)
Version 5.3.0, 2025-02-09
What's new
New Features (ENH)
- Handle attachments in /Kids and provide object-oriented API (#3108) by @stefan6419846
Bug Fixes (BUG)
- Handle annotations being None on merging (#3111) by @stefan6419846
Robustness (ROB)
Documentation (DOC)
- stefan6419846 becomes BDFL of pypdf (#3078) by @MartinThoma
Developer Experience (DEV)
- Remove ignoring multiple Ruff rules by @j-t-1
- Remove unused mutmut configuration (#3092) by @stefan6419846
Testing (TST)
Version 5.2.0, 2025-01-26
What's new
Deprecations (DEP)
- Deprecate with replacement CCITParameters (#3019) by @j-t-1
- Correct deprecation of interiour_color (#2947) by @j-t-1
New Features (ENH)
- Support alternative (U)F names for embedded file retrieval (#3072) by @stefan6419846
- Adding support for reading .metadata.keywords (#2939) by @Lucas-C
Bug Fixes (BUG)
- Handle further Tf operators in text extraction layout mode (#3073) by @blushingpenguin
- Ensure
add_metadata
can deal with_info = None
(#3040) by @xmo-odoo - Handle IndirectObject in CCITTFaxDecode filter (#2965) by @stefan6419846
- Handle chained colorspace for inline images when no filter is set (#3008) by @stefan6419846
- Avoid extracting inline images twice and dropping other operators (#3002) by @stefan6419846
- Fixed reference of value with
str.__new__
in TextStringObject (#2952) by @thomas-forte - Handle indirect objects in font width calculations (#2967) by @nsw42
- Title sometimes is bytes and not str (#2930) by @reformy
- Fix undefined variable for text extraction (regression) (#2934) by @stefan6419846
- Don't close stream passed to PdfWriter.write() (#2909) by @alexaryn
Robustness (ROB)
- Handle zero height fonts when extracting text (#3075) by @blushingpenguin
- Deal with content streams not containing streams (#3005) by @stefan6419846
- Gracefully handle some text operators when the operands are missing (#3006) by @stefan6419846
- Fall back to non-Adobe Ascii85 format for missing end markers (#3007) by @stefan6419846
- Ignore odd-length strings when processing cmap lines (#3009) by @stefan6419846
- Skip annotation destination being NullObject in PdfWriter (#2964) by @stefan6419846
- Skip destination page being None in PdfWriter (#2963) by @dxsooo
- Fix infinite loop case when reading null objects within an Array by @jakep-allenai
- Fixing infinite loop in ArrayObject read_from_stream (#2928) by @jakep-allenai
Documentation (DOC)
- Add note about default line colors (#3014) by @stefan6419846
Developer Experience (DEV)
- Remove ignoring Ruff rule PGH004 (#3071) by @j-t-1
- Tidy ignore array in tool.ruff.lint (#3069) by @j-t-1
- Move Windows CI to Python 3.13 (#3003) by @stefan6419846
- Move to Ubuntu 22.04 (#3004) by @stefan6419846
Maintenance (MAINT)
- Fix formatting of warning message and include exception message (#3076) by @stefan6419846
- Narrow return type for
ContentStream.operations
(#2941) by @kmurphy4
Testing (TST)
- Fix image similarity for upcoming Ubuntu 24.04 (#3039) by @stefan6419846
- Replace broken Apache Tika Corpora urls (#3041) by @stefan6419846