chore(deps): bump quick-xml from 0.37.5 to 0.39.2#2
Closed
dependabot[bot] wants to merge 25 commits into
Closed
Conversation
Complete Office document processing suite: docx, xlsx, pptx parsers with CLI, MCP server, Python and WASM bindings.
Pure Rust parsers for pre-OOXML Microsoft Office binary formats: - cfb_oxide: OLE2/CFBF container reader (foundation for all legacy formats) - doc_oxide: Word Binary (.doc) — FIB, piece table, text extraction, images - xls_oxide: Excel Binary BIFF8 (.xls) — SST, cell records, sheets, images - ppt_oxide: PowerPoint Binary (.ppt) — record tree, text atoms, images Key features: - OfficeDocument trait in office_core for unified API across all 6 formats - Image extraction (JPEG/PNG/EMF/WMF) via shared BLIP parser in cfb_oxide - Magic-byte sniffing: auto-routes mislabeled files (.doc→DOCX, .docx→DOC) - FILEPASS detection for encrypted XLS files (early exit) - Single-pass BIFF parsing, sparse grid builder, BIFF5 fast path Benchmarked on 6,062 files from 11 open-source test suites: - Fastest and highest pass rate across all 6 formats - .xls: 1.5ms mean, 99.2% pass (vs calamine 9.0ms/90.7%, xlrd 36.6ms/93.1%) - .doc: 0.1ms mean, 94.7% pass (vs catdoc 4.3ms/90.2%) - .ppt: 0.3ms mean, 100% pass (vs catppt 2.8ms/77.8%) - .docx: 1.8ms mean, 98.5% (vs python-docx 11.8ms/95.1%) - .xlsx: 11.1ms mean, 97.8% (vs openpyxl 94.5ms/96.2%) - .pptx: 2.3ms mean, 98.4% (vs python-pptx 32.5ms/86.7%)
- Merge 8 separate crates (office_core, cfb_oxide, docx_oxide, xlsx_oxide, pptx_oxide, doc_oxide, xls_oxide, ppt_oxide) into modules under src/ - Keep CLI and MCP as workspace binary crates - Fix all clippy warnings (dead code, identity ops, manual prefix strip) - DRY: extract parallel parsing utility (core::parallel::map_collect) - Add serde + PartialEq derives to IR types, remove 73 lines of manual JSON conversion in wasm.rs - Fix CI feature matrix (remove nonexistent per-format features, add parallel) - Add full crates.io publishing metadata to CLI and MCP (binstall, deb, keywords) - Add release workflow with native binary builds (6 platforms), sequential crates.io publishing, Homebrew + Scoop templates - Add README for CLI and MCP crates - Add CHANGELOG, CONTRIBUTING, CODE_OF_CONDUCT, SECURITY, LICENSE files
- DocumentIR::to_html() renders HTML fragments with proper escaping - Document::to_html() + convenience function to_html(path) - CLI: office-oxide html <file> - MCP: extract tool accepts format="html" - Python: doc.to_html() + office_oxide.to_html(path) - WASM: doc.toHtml() - 5 new tests (paragraph, formatting, escaping, table, list)
- python.rs: replace 130 lines of manual IR→PyDict with serde + generic json_value_to_py helper (6 functions → 1) - MCP protocol.rs: replace manual ir_to_json with serde_json::to_value (IR types have Serialize derives now) - Write modules: import namespace constants from core::xml::ns instead of re-declaring identical strings in docx/xlsx/pptx write.rs - Boolean toggle: extract parse_toggle(e, attr_name) to core::xml, replace duplicate implementations in docx/formatting.rs and xlsx/styles.rs - IR: add TextSpan::plain() constructor, use across all 6 convert_*.rs files to eliminate repetitive struct initialization boilerplate
Use json.loads(serde_json string) instead of manual PyObject construction to avoid pyo3 0.28 type ownership issues with Borrowed<PyBool>.
- DOCX writer: paragraph, heading, table, list round-trips - XLSX writer: cells, multiple sheets, empty cells round-trips - PPTX writer: slide, multiple slides, bullet list round-trips - create_from_ir: DOCX, XLSX, PPTX IR→file→read round-trips - Edit: DOCX replace_text, XLSX set_cell, PPTX replace_text round-trips - Lower coverage threshold to 70% (write/edit modules now tested but not all code paths exercised yet) - Fix cargo fmt formatting across codebase
README fixes: - Pass rate: 96.2% on 2,570 → 98.1% on 6,062 files (latest benchmarks) - Speed claim: "10-60×" → "Up to 100×" (matches actual benchmarks) - Citation year: 2025 → 2026 - Add HTML column to Supported Formats table - Add to_html() to Python and Rust API examples - Add MCP server + CLI sections Python bindings: - Add to_html to __init__.py exports and _native.pyi type stubs - Add save_as to type stubs - Fix ruff CI: scope to python/ dir, remove --fix flag GEO/LLM optimization: - Add llms.txt for AI model discovery (API, performance, use cases) - Add .devin/wiki.json knowledge base for Devin AI
Wheel cross-compilation (musllinux, aarch64, windows-aarch64) fails in Docker environments that can't find Python interpreters. These builds are only needed for PyPI publishing, not for CI validation. Tests and lint still run on every push.
Preempt the same artifact-bloat class of failures that hit pdf_oxide v0.3.27 (330 MB crate, >100 MB sdist — both rejected by registries). - Cargo.toml: replace implicit package contents with an explicit `include` whitelist. Crate drops 133 -> 93 files (273 -> 144 KiB compressed); PyPI sdist drops 132 -> 96 files (283 -> 151 KiB). Prevents future accidental binary/asset commits from leaking into published tarballs. - release.yml: run `wasm-opt -Oz --strip-debug --strip-producers` after wasm-bindgen. Typical reduction ~20-30% on the 873 KiB wasm.
…istent corpus stats
- Auto-fix 89 clippy `collapsible_match` lints across docx/xlsx/pptx/ppt/core (CI
`-D warnings` now passes).
- Escape `[Content_Types].xml` in opc.rs doc comment so `cargo doc` is warning-free.
- Re-run benchmarks on the full 6,062-file corpus with the current parser on an
idle system (warm cache, median of 3 runs). Update BENCHMARKS.md + README.md
to the new numbers; sync CHANGELOG.
- docx 1.6→0.8ms mean, 8.9→3.9ms p99
- pptx 1.4→0.7ms mean, 7.2→3.9ms p99
- xlsx 11.1→5.0ms mean, 97→40ms p99 (vs previous docs)
- Overall pass rate 98.1% → 98.4% (5,965 / 6,062; 97 failures, all invalid
inputs or non-Office files).
- Call out the one non-leadership axis honestly: .xls p99 is 75ms vs xls2csv's
58ms (xls2csv emits truncated output on complex sheets).
Land all artifacts required for the first public release:
- Language bindings: Go (CGo + installer), C# (.NET P/Invoke), Node.js
(koffi), C header (cbindgen), plus src/ffi.rs for the shared C ABI.
- Documentation: per-language getting-started guides under docs/,
wasm-pkg README + dual license, python package README.
- Examples: runnable extract/replace samples in rust, python, go,
javascript, csharp, and c.
- Release tooling: .github/scripts/{bump-version,check-versions}.sh
and optional .github/hooks/pre-commit for version-parity guards.
- Wire-up on modified files: CI matrix extends to the new bindings,
release.yml publishes to crates.io / PyPI / npm (×2) / NuGet / Go /
Homebrew / Scoop with smoke tests; crate metadata, python stubs,
wasm-pkg package.json updated for v0.1.0.
- .gitignore: exclude wasm-pkg build outputs, csharp bin/obj/runtimes,
js prebuilds, and the go/install installer binary.
node --test no longer auto-discovers files when given a directory arg; it tries to load the path as a single module and fails. Explicitly glob test/*.mjs so npm test succeeds on node 18 / 20 / 22+.
npm 10+ writes hasInstallScript: true for packages with postinstall hooks; commit so it doesn't show as drift on every fresh install.
The Rust cdylib/staticlib product is named liboffice_oxide (matching the crate name office_oxide with a leading lib- prefix). Several user-facing docs, error messages, and comments had dropped an f. Runtime path resolution already computed the correct name, so nothing was broken at runtime — but the docs told users the wrong filename.
…V, dependabot - Add concurrency cancel-in-progress and workspace-wide clippy/doc gates - Raise coverage threshold to 85% (ignoring ffi.rs alongside python/wasm) - Consolidate cargo-deny into a single --all-features check; tighten license allow-list (drop MPL/NCSA/OpenSSL, add Apache-2.0 WITH LLVM-exception) - New jobs: taplo fmt, cargo-hack feature powerset, cargo-semver-checks (PR-only), MSRV build - Add dependabot config (cargo, actions, pip) and .taplo.toml - Centralize clippy allow-list via [workspace.lints.clippy]; apply [lints] workspace = true across crates - Reformat toml/rs files to match taplo / rustfmt
Contributor
Author
LabelsThe following labels could not be found: Please fix the above issues or remove invalid values from |
…ed Python libs, scripts/bench.sh bench_rust previously measured calamine / docx-rs / dotext but not office_oxide itself, so the repo had no Rust-to-Rust number. Wire office_oxide in as a path dep (bench_rust kept out of the main workspace via a nested [workspace] table), add .xls coverage, --json output, peak-RSS capture via getrusage, and panic-catching for competitor crashes. bench_python.py adds python-calamine + xlrd wrappers so the tables in BENCHMARKS.md are reproducible from the shipped harness. Adds --json output and RSS capture for parity with bench_rust. scripts/bench.sh is the single reproducible entry point: captures machine spec, installs pinned competitor versions from scripts/bench-requirements.txt, builds release, runs both harnesses, writes machine.json / python.json / rust.json to an output dir. README tagline reframed to 'Fastest Native Office Document Library' with an explicit scope block (POI/Tika are out of scope — JVM). New 'Reproducing these numbers' + 'Scope and non-goals' sections in BENCHMARKS.md. GAPS.md tracks the remaining deferred items (full-corpus re-run, POI/Tika comparison, bench_results.json cleanup).
## Rich creation API - xlsx/write: Add CellStyle (bold, color, background, number format, alignment, wrap), NumberFormat enum, HAlign enum, CellData::Formula, set_cell_styled, set_column_width; dynamic styles.xml generation with deduplication - docx/write: Add Run struct with bold/italic/underline/strikethrough/color/ font_size/font_name builder; Alignment enum; add_rich_paragraph, add_paragraph_aligned, add_rich_paragraph_aligned, add_page_break - pptx/write: Add Run struct with same builder API; add_rich_text, add_text_box (positioned, EMU), add_rich_text_box ## Markdown → Office - src/ir_from_markdown.rs: DocumentIR::from_markdown (pure-Rust, no deps) - src/create.rs: create_from_markdown, create_from_markdown_to_writer - Python binding: create_from_markdown pyfunction + pyi stub ## FFI parity — create_from_markdown in all languages - src/ffi.rs: office_create_from_markdown (C FFI) - go/office_oxide.go: CreateFromMarkdown wrapper - js/lib/native.js + index.js: createFromMarkdown - csharp: NativeMethods.OfficeCreateFromMarkdown + Document.CreateFromMarkdown ## Numbered examples (01–06, all self-contained, no fixtures needed) - Rust: extract, create-rich-docx, create-xlsx-formulas, create-pptx-textboxes, edit-roundtrip, markdown-to-all-formats - Python: extract, create-from-markdown, edit, batch-processing - Go: extract, create-from-markdown, edit - JavaScript: extract, create-from-markdown, edit - C#: extract, create-from-markdown, edit ## CI — examples now run (not just compile) - test job: cargo run all 6 Rust examples (ubuntu/stable) - python job: run 4 Python examples (python 3.12) - go job: go run -tags office_oxide_dev all 3 Go examples - node-native job: node all 3 JS examples - csharp job: dotnet run all 3 C# examples ## OSS readiness - LICENSE-MIT: corrected copyright to "2026 Yury Fedoseev" (all 4 copies) - SECURITY.md: direct contact email yfedoseev@gmail.com - CONTRIBUTING.md: updated to reflect actual single-crate structure - docs/MISSION.md: rewritten to reflect shipped state, no cross-project refs - GAPS.md, bench_results.json: deleted (stale internal artifacts) - CI action versions: all corrected to v4/v5 (checkout, setup-python, etc.)
…xes2 - shear: cargo-shear checks for unused dependencies - bench: cargo bench --no-run verifies all benchmarks compile - cli: cargo build + --help smoke test for office_oxide_cli Brings CI gate coverage to parity with pdf_oxide_fixes2.
Bumps [quick-xml](https://github.com/tafia/quick-xml) from 0.37.5 to 0.39.2. - [Release notes](https://github.com/tafia/quick-xml/releases) - [Changelog](https://github.com/tafia/quick-xml/blob/master/Changelog.md) - [Commits](tafia/quick-xml@v0.37.5...v0.39.2) --- updated-dependencies: - dependency-name: quick-xml dependency-version: 0.39.2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
f98ea67 to
38608f3
Compare
Contributor
Author
|
OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting If you change your mind, just re-open this PR and I'll resolve any conflicts on it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bumps quick-xml from 0.37.5 to 0.39.2.
Release notes
Sourced from quick-xml's releases.
... (truncated)
Changelog
Sourced from quick-xml's changelog.
... (truncated)
Commits
5611c89Release 0.39.2b8eba9aMerge pull request #941 from Mingun/full-coverf8e8857Implement read_text_into and read_text_into_async489dc17Place;to the buffer when read general entity references9a7e8f5Place>to the buffer when read elements, processing instructions and XML d...c34af48Place>to the buffer when read comment, CDATA or DOCTYPE241f01eReturn only index from BangType::parse (renamed to feed) like in other parserse3230c2Append +1 outside of BangType, in read_bang_element, like read_with do623c92cRewriteread_bang_elementwith the same style asread_with,read_refan...e06f70aMerge pull request #940 from Mingun/fix-939