Release magic_pdf-1.2.0-released · opendatalab/MinerU

What's Changed

This version includes several fixes and improvements to enhance parsing efficiency and accuracy:

Performance Optimization
- Increased classification speed for PDF documents in auto mode.
Parsing Optimization
- Improved parsing logic for documents containing watermarks, significantly enhancing the parsing results for such documents.
- Enhanced the matching logic for multiple images/tables and captions within a single page, improving the accuracy of image-text matching in complex layouts.
Bug Fixes
- Fixed an issue where image/table spans were incorrectly filled into text blocks under certain conditions.
- Resolved an issue where title blocks were empty in some cases.

这个版本我们修复了一些问题，提升了解析的效率与精度：

性能优化
- auto模式下pdf文档的分类速度提升
- 在华为昇腾 NPU 加速模式下，添加高性能插件支持，常见场景下端到端加速可达 300% 申请链接
解析优化
- 优化对包含水印文档的解析逻辑，显著提升包含水印文档的解析效果
- 改进了单页内多个图像/表格与caption的匹配逻辑，提升了复杂布局下图文匹配的准确性
问题修复
- 修复在某些情况下图片/表格span被填充进textblock导致的异常
- 修复在某些情况下标题block为空的问题

Full Changelog: magic_pdf-1.1.0-released...magic_pdf-1.2.0-released