Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
6b18677
release: v0.1.2 — round-trip fidelity, IR layout, perf, embedded fonts
yfedoseev May 14, 2026
9d4380c
chore(ci): bump actions/attest-sbom from 2.4.0 to 4.1.0
dependabot[bot] May 6, 2026
be5d3a3
chore(ci): bump github/codeql-action from 3.35.2 to 4.35.3
dependabot[bot] May 6, 2026
7ffcacc
chore(ci): bump actions/upload-artifact from 4.6.2 to 7.0.1
dependabot[bot] May 6, 2026
68c48fb
chore(ci): update dtolnay/rust-toolchain requirement to 29eef336d9b28…
dependabot[bot] May 6, 2026
846f5f4
chore(ci): bump actions/github-script from 7.0.1 to 9.0.0
dependabot[bot] May 6, 2026
c7219cf
chore(deps): bump koffi from 2.16.1 to 2.16.2 in /js
dependabot[bot] May 13, 2026
591a88d
chore(deps): bump quick-xml from 0.37.5 to 0.40.0
dependabot[bot] May 13, 2026
e414477
fix(deps): adapt to quick-xml 0.40 API changes
yfedoseev May 13, 2026
7450838
docs(cli,mcp): add crate-level docs for binary crates
yfedoseev May 14, 2026
1ba28e7
docs(changelog): expand v0.1.2 entry for recent branch changes
yfedoseev May 15, 2026
c317900
fix(pptx): add missing color_rgb field to test TextRun constructors
yfedoseev May 15, 2026
59f54e3
fix(review): address Copilot review comments on PR #38
yfedoseev May 15, 2026
dfd3a34
fix(review): more Copilot follow-ups + coverage tests
yfedoseev May 15, 2026
5745467
fix(review): rustfmt + more Copilot follow-ups
yfedoseev May 15, 2026
83a33ac
docs(review): document embed_font/font_table style limitations + debu…
yfedoseev May 15, 2026
c42825b
fix(docx): track header-vs-footer role at parse time
yfedoseev May 15, 2026
ac38c7e
test: add IR round-trip coverage for previously untested Element vari…
yfedoseev May 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ jobs:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v4

- name: Install Rust
uses: dtolnay/rust-toolchain@3c5f7ea28cd621ae0bf5283f0e981fb97b8a7af9 # master
uses: dtolnay/rust-toolchain@29eef336d9b2848a0b548edc03f92a220660cdb8 # master
with:
toolchain: ${{ matrix.rust }}

Expand Down Expand Up @@ -217,7 +217,7 @@ jobs:
# its original target/release/ path so binding test code works
# unchanged.
- name: Upload native lib artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: native-lib-${{ matrix.os }}
retention-days: 1
Expand Down Expand Up @@ -626,7 +626,7 @@ jobs:
run: |
v=$(grep -E '^rust-version' Cargo.toml | head -1 | sed 's/.*"\(.*\)".*/\1/')
echo "version=${v:-1.85}" >> "$GITHUB_OUTPUT"
- uses: dtolnay/rust-toolchain@3c5f7ea28cd621ae0bf5283f0e981fb97b8a7af9 # master
- uses: dtolnay/rust-toolchain@29eef336d9b2848a0b548edc03f92a220660cdb8 # master
with:
toolchain: ${{ steps.msrv.outputs.version }}
- uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
uses: dtolnay/rust-toolchain@29eef336d9b2848a0b548edc03f92a220660cdb8 # stable

- name: Initialize CodeQL
uses: github/codeql-action/init@ce64ddcb0d8d890d2df4a9d1c04ff297367dea2a # v3
uses: github/codeql-action/init@e46ed2cbd01164d986452f91f178727624ae40d7 # v3
with:
languages: ${{ matrix.language }}
# Use default queries + security-extended suite
Expand All @@ -44,6 +44,6 @@ jobs:
run: cargo build --lib

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@ce64ddcb0d8d890d2df4a9d1c04ff297367dea2a # v3
uses: github/codeql-action/analyze@e46ed2cbd01164d986452f91f178727624ae40d7 # v3
with:
category: "/language:${{ matrix.language }}"
2 changes: 1 addition & 1 deletion .github/workflows/outdated.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:

- name: Open issue for outdated deps
if: steps.outdated.outputs.has_outdated == 'true'
uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v7
with:
script: |
const title = `chore: outdated dependencies (${new Date().toISOString().slice(0,7)})`;
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ jobs:
manylinux: ${{ matrix.manylinux }}
args: --release --features python --out dist
- name: Upload wheels as artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: wheels-linux-${{ matrix.target }}-${{ matrix.manylinux }}
path: dist/*.whl
Expand All @@ -172,7 +172,7 @@ jobs:
target: ${{ matrix.target }}
args: --release --features python --out dist
- name: Upload wheels as artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: wheels-macos-${{ matrix.target }}
path: dist/*.whl
Expand All @@ -199,7 +199,7 @@ jobs:
target: ${{ matrix.target }}
args: --release --features python --out dist
- name: Upload wheels as artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: wheels-windows-${{ matrix.target }}
path: dist/*.whl
Expand All @@ -217,7 +217,7 @@ jobs:
command: sdist
args: --out dist
- name: Upload sdist as artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: sdist
path: dist/*.tar.gz
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ jobs:
echo "ARCHIVE=$ARCHIVE" >> $GITHUB_ENV

- name: Upload artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: ${{ matrix.artifact_name }}
path: ${{ env.ARCHIVE }}
Expand Down Expand Up @@ -232,7 +232,7 @@ jobs:
cp target/${{ matrix.target }}/release/office_oxide.lib staging/lib/ 2>/dev/null || true
cp -r include/office_oxide_c staging/include/
cd staging && 7z a "../${{ matrix.artifact_name }}.zip" . && cd ..
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
- uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: ${{ matrix.artifact_name }}
path: |
Expand Down Expand Up @@ -326,7 +326,7 @@ jobs:
printf '{"type": "module"}\n' > wasm-pkg/web/package.json

- name: Upload WASM artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: wasm-package
path: wasm-pkg/
Expand Down Expand Up @@ -380,7 +380,7 @@ jobs:
run: maturin build --release --features python --target ${{ matrix.target }} --out dist

- name: Upload wheels
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: ${{ matrix.artifact_name }}
path: dist/*.whl
Expand Down Expand Up @@ -431,7 +431,7 @@ jobs:
done
done
ls -R js/prebuilds
- uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
- uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: node-native-package
path: js/
Expand Down Expand Up @@ -748,7 +748,7 @@ jobs:
GH_TOKEN: ${{ github.token }}

- name: Attest SBOM
uses: actions/attest-sbom@bd218ad0dbcb3e146bd073d1d9c6d78e08aa8a0b # v2
uses: actions/attest-sbom@c604332985a26aa8cf1bdc465b92731239ec6b9e # v4.1.0
with:
subject-path: sbom.cdx.json
sbom-path: sbom.cdx.json
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/scorecard.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,13 @@ jobs:
publish_results: true

- name: Upload Scorecard results as artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: scorecard-results
path: results.sarif
retention-days: 5

- name: Upload Scorecard results to GitHub Security tab
uses: github/codeql-action/upload-sarif@ce64ddcb0d8d890d2df4a9d1c04ff297367dea2a # v3
uses: github/codeql-action/upload-sarif@e46ed2cbd01164d986452f91f178727624ae40d7 # v3
with:
sarif_file: results.sarif
189 changes: 189 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,195 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.2] - 2026-05-14
Comment thread
yfedoseev marked this conversation as resolved.

> Round-trip fidelity, IR layout features, embedded fonts, XLSX number formatting, and an O(1) style-lookup perf win.

### Performance

- **XLSX styles**: cell-format lookups now use a `HashMap`, replacing
the linear `Vec` scan in `format_cell_value` / `is_date_cell`.
Per-cell formatting becomes O(1); large styled workbooks parse
noticeably faster with no API change.

### Round-trip fidelity (PDF → office → PDF)

- **Alignment, spacing, footers, and horizontal rules** preserved end-to-end
through both `to_docx` and `to_pptx` writers.
- **Images, fonts, and column layouts** preserved across DOCX, PPTX, and
XLSX. Source-PDF font programs that previously registered as empty
subsets now embed correctly.
- **`Element::ThematicBreak`** encoded in PPTX as a centered 30-char run
of `U+2500 BOX DRAWINGS LIGHT HORIZONTAL`. Downstream PDF renderers
detect the all-U+2500 content and re-emit a real horizontal rule.
- **DOCX horizontal rules** recovered from the conventional encoding
(empty paragraph + `<w:pBdr><w:bottom/>`) back into `Element::ThematicBreak`.

### DOCX

- **`<w:framePr>` parsed into IR** as `FramePosition` (twips, page-anchored)
on both `Paragraph` and `Heading`. Used by layout-preserving paths
(e.g. pdf_oxide's `to_docx_bytes_layout`).
- **Floating drawings and vector shapes**: `<wp:anchor>` images plus
`<wps:wsp>` preset shapes (line, rect) with stroke/fill RGB and
stroke width round-trip through `DrawingInfo`.
- **Per-section page sizes** preserved through `to_ir`; multi-section IR
emits per-section `<w:sectPr>`.
- **`<w:sz>` preserved** through to IR's `font_size_half_pt`.
- **Run colour** from `<w:rPr><w:color w:val="RRGGBB"/>` propagated into
`TextSpan.color` during `to_ir`, so PDF→DOCX→PDF round-trips keep
coloured text. Only the `ColorRef::Rgb` variant is plumbed today;
theme / system / `auto` colours still fall through to the renderer
default (proper resolution needs `theme.xml` threaded into the
convert path).
- **Headers and footers** now included in `to_markdown` and `to_ir`
(previously silently dropped).
- **Embedded fonts** under `/word/fonts/` exposed on
`DocxDocument.embedded_fonts`. `strip_embedded_font_filename` recovers
the original face name from `font_<n>_<face>.<ext>` (fixes greedy
alphabetic-trim regression where `TeXGyreTermesX-` was returned
instead of `TeXGyreTermesX-Regular`).
- **`parse_drawing` decomposed** into focused recursive helpers
(`parse_inline_or_anchor_body`, `parse_anchor_position`,
`parse_shape_properties`, etc.) for readability.
- **Run-level `<w:rFonts w:ascii>` plumbed into `TextSpan.font_name`**;
`<w:cols>` propagated to `Section.columns`.

### PPTX

- **Pagination**: each slide forces a `SectionBreakType::NextPage` so two
slides never share a rendered page.
- **Real Title+Body slide layout** emitted by the writer instead of a blank
layout, so PowerPoint shows placeholder hints in edit mode.
- **Slide background**: `<p:cSld><p:bg><p:bgPr><a:solidFill><a:srgbClr>`
parsed into `Slide.background_rgb` and propagated to `Section.background_rgb`.
- **Positioned text boxes**: shapes with explicit `<a:xfrm>` coordinates
wrap their content in `Element::TextBox` so downstream renderers can
place them at absolute EMU coordinates. Zero-size shapes skip the wrapper.
- **Slide size → page setup**: `<p:sldSz cx=… cy=…>` propagated to each
section's `PageSetup`.
- **Run font sizes preserved** via new `TextRun.font_size_hundredths_pt`
(parsed from `<a:rPr sz="…"/>`).
- **Run colour preserved** via new `TextRun.color_rgb: Option<[u8; 3]>`
parsed from `<a:rPr><a:solidFill><a:srgbClr val="RRGGBB"/></a:solidFill>`
and propagated to `TextSpan.color` in IR. The parser tracks an
`in_solid_fill` flag so sibling effects (e.g. `<a:hl><a:srgbClr/>`
for hyperlink colour) don't leak into the run's own fill; non-sRGB
fills (gradient, scheme colour) fall back to `None`.
- **Paragraph alignment** parsed from `<a:pPr algn="…"/>` (all five
variants: `l` / `ctr` / `r` / `just` / `dist`) into
`TextParagraph.alignment`. **Space-before** parsed from
`<a:spcBef><a:spcPts val=…/>`.
- **Title alignment propagation**: `find_title` returns text + first
paragraph's alignment, seeding both `Section.title` and the synthesised
level-2 Heading's alignment.
- **Picture shapes** now carry `embed_rid`, `data`, and `format`
(resolved via a pre-built media map at parse time, so the parallel
slide parser doesn't need the OPC reader).
- **Font embedding** under `/ppt/fonts/`.
- **Structured chart text extraction**: `<c:chart>` parts parsed into
per-chart text blocks rendered as `## Chart N` in markdown / search /
PDF without needing a graphical chart renderer.
- **Compaction**: consecutive H1/H2 cover-page headings fold into one
slide instead of fragmenting; long XLSX paragraphs split across cells
to respect ~32k char-per-cell limits.
- **Slide cap**: writer caps at ~250 slides (PowerPoint's hard limit).

### XLSX

- **Per-worksheet `page_setup`** round-trips via `<pageMargins>` (inches)
and `<pageSetup>` (paperWidth/paperHeight with mm/cm/in suffix or
`paperSize` enum 1–13). New `Worksheet.page_setup`.
- **`numfmt` module** (`crate::xlsx::numfmt`): built-in IDs 0–44 (general,
fixed, commas, percent, currency, scientific, accounting) and a
simplified custom format-string parser (multi-section, `[Red]` color
directives stripped, currency prefix from `[$€-407]`, quoted literal
suffix, percent and thousands separators). Applied to numeric cells
during `format_cell_value` and `write_cell_value_fast`.
- **Font sizes** preserved through IR; long-text single-column sheets
emit as paragraphs instead of a tall 1-column GFM table.
- **Unique worksheet names** in `ir_to_xlsx` (duplicates suffixed with
`_2`, `_3`, …).
- **Drawings**: `xl/drawings/drawingN.xml` parsed into
`Worksheet.images` (`WorksheetPicture` with EMU coords + bytes) and
`Worksheet.text_shapes` (`WorksheetTextShape` for layout-mode text
boxes from `to_xlsx_bytes_layout`).
- **Embedded fonts** under `/xl/fonts/`.

### IR enrichment

- **New types**: `Shape` (vector shape anchored at absolute EMU coords),
`ShapeGeom` (`Line`, `Rect`), `FramePosition` (twip-anchored frame).
- **`Heading`** gains `frame_position` + `alignment`.
- **`Section`** gains `background_rgb`.
- **`ParagraphAlignment`** gains the `Distribute` variant.
- **`Element::Shape(Shape)`** variant for vector shapes.
- **New helpers**: `first_inline_font_size_pt`, `inline_to_element_block`,
`build_nested_list` (flat / 2-level / 3-level recursion).
- **Centralized defaults** in `ir_render::block_default`: ThematicBreak
renders as `"---"` / `<hr />`; PageBreak / ColumnBreak / Shape are
invisible in flow; TextBox / Footnote / Endnote recursively render
children. Adding a new `Element` variant forces a compile error
in `block_default::default_plain` instead of silent fallthrough.

### Core

- **`crate::core::core_properties`**: shared `docProps/core.xml` generator
used by all three writers. Emits `dc:title`, `dc:creator`, `dc:subject`,
`dc:description`, `cp:keywords`, `dcterms:created`, `dcterms:modified`
from the IR's `Metadata`. Empty fields are omitted entirely.
- **`crate::core::embedded_fonts`**: unified font-embedding helper
(`write_embedded_fonts`, `sanitize_font_filename`). All three formats
share the layout `<prefix>font_<n>_<safe_name>.ttf`.
- **`HalfPoint::from_word_sz` / `from_drawingml_sz` / `to_drawingml_sz` /
`from_points_rounded`**: cross-format font-size invariants
(DrawingML hundredths-of-a-point vs WML half-points).

### Dependencies

- **`quick-xml` 0.37 → 0.40**: upstream removed `BytesText::unescape()`
and deprecated `Attribute::unescape_value()` (its replacement
`normalized_value()` has different semantics — no entity
unescaping). Migration added two helpers in `core::xml`:
`unescape_text(BytesText) -> Result<String>` (used by 6 call sites)
and `unescape_attr_value` (used by 6 call sites, with
`#[allow(deprecated)]` localised to the helper so call sites stay
deprecation-free). 535 / 535 tests still pass; clippy clean.
- **`koffi` 2.16.1 → 2.16.2** in `js/` (patch bump).

### Documentation

- **CLI / MCP crate-level docs**: `office_oxide_cli` and
`office_oxide_mcp` previously opened with `mod commands;` /
`mod protocol;` and had no crate-level rustdoc. Added a short
`//!` block plus `#![warn(missing_docs)]` so future items in
either binary stay documented.
`RUSTDOCFLAGS="-D missing_docs" cargo doc --workspace --no-deps
--features parallel,mmap` now passes with zero errors.

### Tests

- **+98 unit tests** across the modules touched in this release:
`core::embedded_fonts`, `core::core_properties`, `core::units`,
`xlsx::numfmt`, `xlsx::worksheet`, `docx::formatting`, `docx::mod`,
`pptx::slide`, `ir`, `ir_render`.
- **535 / 535 tests pass** across default, `--features parallel`,
`--features mmap`, and `--features parallel,mmap` builds.
- `cargo fmt` clean. `cargo clippy --workspace --all-targets -- -D warnings`
clean.

### Bindings

- **Python wheel** (maturin, PyO3 0.28) builds cleanly and exposes
`Document`, `EditableDocument`, `XlsxWriter`, `PptxWriter`,
`OfficeOxideError`, `create_from_markdown`, `extract_text`,
`to_markdown`, `to_html`, `version`.
- **WASM** package (`wasm-pack build --target web/node/bundler`) builds
cleanly with `--features wasm`.
- **C#** package bumped to 0.1.2 (csproj only — no API changes).

[0.1.2]: https://github.com/yfedoseev/office_oxide/compare/v0.1.1...v0.1.2

## [0.1.1] - 2026-04-30

> Richer IR type system, DOCX writer output, improved PPTX/XLSX IR renderers, and writer APIs in all language bindings
Expand Down
10 changes: 5 additions & 5 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading