Add office file exporters (DOCX/XLSX/PPTX/RTF) for Markdown output by ronaldmannak · Pull Request #34 · PicoMLX/PicoDocs

ronaldmannak · 2026-06-18T20:46:29Z

Implement the write-side of PicoDocs: convert Markdown and ConverterResult into binary office formats, completing the round-trip capability.

Summary

This PR adds a complete export pipeline that mirrors the existing read-side converters. Users can now serialize LLM Markdown output or structured document results into DOCX, XLSX, PPTX, and RTF files via PicoDocsEngine.write(...). The implementation includes hand-rolled OOXML writers for DOCX/XLSX/PPTX (ensuring round-trip fidelity with the existing readers), platform-native exporters for RTF, and shared Markdown intermediate representations used by both renderers and exporters.

Key Changes

Shared Markdown IR: Extracted block-level (MarkdownBlock) and inline (MarkdownInline) parsing from DocumentRenderer into reusable enums (MarkdownBlockParser, MarkdownInlineParser). Both renderers and exporters now consume the same structured representation, ensuring consistency and simplifying future parser improvements.
Hand-rolled OOXML exporters:
- WordprocessingMLExporter: Writes minimal-but-valid DOCX files with headings (via w:pStyle), emphasis (w:b/w:i), lists (w:numPr), tables, links (w:hyperlink), and embedded images. Deliberately mirrors WordConverter's markers for round-trip validation.
- XLSXExporter: Generates SpreadsheetML with one sheet per document section. Uses lossless metadata["csv"] when present; otherwise flattens Markdown to rows (matching DocumentRenderer.renderCSV).
- PPTXExporter: Creates PresentationML with one slide per top-level heading or explicit .slide section. Includes fixed template parts (slide master, layout, theme) via PPTXTemplates.
Platform-native exporters:
- AttributedStringRTFExporter: Builds NSAttributedString from the Markdown IR and serializes to RTF (available across all Apple platforms).
- AttributedStringDOCXExporter: Optional macOS-only DOCX writer using NSAttributedString.officeOpenXML; intentionally not registered by default (hand-rolled version is the primary).
- AttributedStringDocumentBuilder: Shared utility that constructs attributed strings from ConverterResult, supporting headings, emphasis, lists, code, blockquotes, and tables.
Infrastructure:
- DocumentExporter protocol: Sendable, synchronous serializers (pure CPU, no I/O).
- DocumentExporterRegistry: Immutable, priority-ordered registry of exporters (mirrors DocumentConverterRegistry).
- OOXMLPackageWriter: In-memory ZIP builder for OOXML files (no temp files; pure Data output).
- ExportableFileType: Enum for writable formats (DOCX, XLSX, PPTX, RTF, Pages, Keynote). Pages/Keynote are marked unimplemented with research notes explaining why third-party iWork writing is impractical.
- OfficeMediaType: Shared MIME type mapping for image extensions (used by both readers and writers).
Engine API: Added PicoDocsEngine.write(result:to:registry:) and PicoDocsEngine.write(markdown:to:) for serialization.
Tests: ExporterTests validates round-trip (write → re-import with existing converters → compare recovered text) for DOCX and XLSX, and structural validation for PPTX (unzip, required parts, title text).
Documentation: Updated README with export examples.

Notable Implementation Details

Round-trip oracle: DOCX/XLSX exporters are validated by re-importing with WordConverter/SpreadsheetConverter and comparing recovered Markdown, using the readers as ground truth.
Markdown IR extraction: DocumentRenderer's inline helpers remain for

https://claude.ai/code/session_01MRakJBZVBQ4CcTdcypk9Z6

PicoDocs was one-directional (office files -> Markdown). This adds the "exporter" half of the engine: turn LLM Markdown output or a structured ConverterResult into DOCX, XLSX, PPTX, or RTF. Architecture mirrors the import side: - DocumentExporter protocol + DocumentExporterRegistry (value type, priority-ordered, .registering chaining) parallel DocumentConverter/Registry. - PicoDocsEngine.write(_:to:) / write(markdown:to:) / transcode(...) parallel convert/export, returning Data. Empty input throws .emptyDocument (image-only results are exempt); unimplemented formats throw .unableToExportToRequestedFormat. - New ExportableFileType (binary office outputs), kept separate from the text ExportFileType. Shared intermediate representation: - Extracted the renderer's private Block parser into internal MarkdownBlock / MarkdownBlockParser so both halves use one parser (behavior-preserving; DocumentRenderer repointed at it). - Added a structured MarkdownInline IR for exporters that need run structure. Per-format writers: - WordprocessingMLExporter: primary, all-platform DOCX writer (hand-rolled OOXML), the round-trip oracle; mirrors WordConverter's markers (Heading{N} styles, w:numPr lists, w:b/w:i, w:hyperlink, a:blip image drawings). - XLSXExporter / PPTXExporter: hand-rolled OOXML on a shared OOXMLPackageWriter (ZIPFoundation write). - AttributedStringRTFExporter (Apple) builds NSAttributedString from the IR (no WebKit HTML import, so it stays thread-agnostic); AttributedStringDOCXExporter is opt-in/macOS-only and NOT registered by default. - Pages/Keynote are a documented spike only (no writer). - Lifted WordConverter.mimeType(forExtension:) into shared OfficeMediaType. Tests: round-trip DOCX/XLSX (via WordConverter/SpreadsheetConverter), RTF (Apple, via RTFConverter), structural PPTX, inline-IR unit tests, and the empty/unsupported contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01MRakJBZVBQ4CcTdcypk9Z6

gemini-code-assist

Code Review

This pull request introduces binary office document exporting capabilities (DOCX, XLSX, PPTX, and RTF) to PicoDocs, sharing a unified block and inline Markdown intermediate representation between converters and exporters. Feedback on the changes highlights several key improvement opportunities: sanitizing invalid XML 1.0 control characters to prevent corrupted office files, making image lookups robust against path prefixes, preserving multi-scalar grapheme clusters (such as emojis) during sheet name sanitization, and pre-compiling regular expressions in the inline parser to avoid recursive overhead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-18T20:48:40Z

+    static func escape(_ text: String) -> String {
+        text.replacingOccurrences(of: "&", with: "&amp;")
+            .replacingOccurrences(of: "<", with: "&lt;")
+            .replacingOccurrences(of: ">", with: "&gt;")
+    }


XML 1.0 documents do not allow certain control characters (e.g., 0x00 to 0x08, 0x0B, 0x0C, 0x0E to 0x1F except tab, newline, and carriage return). If these characters are present in the text (for example, extracted from PDFs or generated by LLMs), writing them directly into the XML will result in corrupted office files that fail to open in Microsoft Office. We should sanitize the input string by filtering out these invalid XML characters.

Suggested change

static func escape(_ text: String) -> String {

text.replacingOccurrences(of: "&", with: "&")

.replacingOccurrences(of: "<", with: "<")

.replacingOccurrences(of: ">", with: ">")

}

static func escape(_ text: String) -> String {

let sanitized = String(text.unicodeScalars.filter { scalar in

let val = scalar.value

return val == 0x9 || val == 0xA || val == 0xD ||

(val >= 0x20 && val <= 0xD7FF) ||

(val >= 0xE000 && val <= 0xFFFD) ||

(val >= 0x10000 && val <= 0x10FFFF)

})

return sanitized.replacingOccurrences(of: "&", with: "&")

.replacingOccurrences(of: "<", with: "<")

.replacingOccurrences(of: ">", with: ">")

}

gemini-code-assist · 2026-06-18T20:48:40Z

+        private func imageRun(alt: String, source: String) -> String? {
+            guard let data = images[source] else { return nil }


If the image source in the Markdown contains a path (e.g., word/media/pic.png) rather than just the basename, looking up images[source] will fail because the images index is keyed by the last path component (basename). To make the lookup robust, we should fall back to checking the last path component of the source string.

Suggested change

private func imageRun(alt: String, source: String) -> String? {

guard let data = images[source] else { return nil }

private func imageRun(alt: String, source: String) -> String? {

let key = images[source] != nil ? source : (source as NSString).lastPathComponent

guard let data = images[key] else { return nil }

gemini-code-assist · 2026-06-18T20:48:40Z

+    private static func sanitizeSheetName(_ name: String) -> String {
+        let invalid = CharacterSet(charactersIn: ":\\/?*[]")
+        let cleaned = String(name.unicodeScalars.map { invalid.contains($0) ? Character(" ") : Character($0) })
+            .trimmingCharacters(in: .whitespacesAndNewlines)
+        return String(cleaned.prefix(31))
+    }


Mapping each UnicodeScalar individually to a Character and then joining them breaks multi-scalar grapheme clusters (such as flag emojis, skin tone modifiers, or family emojis) into separate characters. To preserve grapheme clusters, we should map the scalars to a new UnicodeScalarView first, and then convert that view to a String.

Suggested change

private static func sanitizeSheetName(_ name: String) -> String {

let invalid = CharacterSet(charactersIn: ":\\/?*[]")

let cleaned = String(name.unicodeScalars.map { invalid.contains($0) ? Character(" ") : Character($0) })

.trimmingCharacters(in: .whitespacesAndNewlines)

return String(cleaned.prefix(31))

}

private static func sanitizeSheetName(_ name: String) -> String {

let invalid = CharacterSet(charactersIn: ":\\/?*[]")

let cleanedScalars = name.unicodeScalars.map { invalid.contains($0) ? UnicodeScalar(32) : $0 }

let cleaned = String(String.UnicodeScalarView(cleanedScalars))

.trimmingCharacters(in: .whitespacesAndNewlines)

return String(cleaned.prefix(31))

}

gemini-code-assist · 2026-06-18T20:48:40Z

+
+    /// Parses `***`/`**`/`*` emphasis into nested nodes, preferring (at the same
+    /// position) the longest delimiter — mirroring the renderer's pass order so
+    /// `***x***` becomes strong(emphasis(x)).
+    static func parseEmphasis(_ text: String) -> [MarkdownInline] {
+        guard !text.isEmpty else { return [] }
+        let patterns: [(pattern: String, wrap: ([MarkdownInline]) -> MarkdownInline)] = [
+            ("\\*\\*\\*(.+?)\\*\\*\\*", { .strong([.emphasis($0)]) }),
+            ("\\*\\*(.+?)\\*\\*", { .strong($0) }),
+            ("\\*(.+?)\\*", { .emphasis($0) }),
+        ]


Compiling regular expressions is relatively expensive. Since parseEmphasis is called recursively on inline runs, compiling the three regular expressions on every single call introduces unnecessary CPU overhead. We should compile them once as static properties of the parser.

private static let strongEmphasisRegex = try! NSRegularExpression(pattern: "\\*\\*\\*(.+?)\\*\\*\\*") private static let strongRegex = try! NSRegularExpression(pattern: "\\*\\*(.+?)\\*\\*") private static let emphasisRegex = try! NSRegularExpression(pattern: "\\*(.+?)\\*") /// Parses `***`/`**`/`*` emphasis into nested nodes, preferring (at the same /// position) the longest delimiter — mirroring the renderer's pass order so /// `***x***` becomes strong(emphasis(x)). static func parseEmphasis(_ text: String) -> [MarkdownInline] { guard !text.isEmpty else { return [] } let patterns: [(regex: NSRegularExpression, wrap: ([MarkdownInline]) -> MarkdownInline)] = [ (strongEmphasisRegex, { .strong([.emphasis($0)]) }), (strongRegex, { .strong($0) }), (emphasisRegex, { .emphasis($0) }), ]

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc33c18899

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T20:59:41Z

+                filename += ".\(ext)"
+            }
+            mediaExtensions.insert(ext)
+            media.append((filename, data))


Reuse media parts for repeated image references

When a result contains two inline references to the same embedded image basename (for example, a DOCX that reuses the same logo; WordConverter keeps both Markdown refs but de-duplicates the .image section), imageRun appends the same filename to media for every occurrence. The later package write then adds word/media/<filename> multiple times, which either fails on duplicate ZIP entries or creates an invalid OOXML package with duplicate part names. Cache emitted media by source/filename and reuse it instead of appending a new part per run.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T20:59:41Z

+                external: false
+            ))
+            let docPrID = media.count
+            let name = OOXMLPackageWriter.escapeAttribute(source.isEmpty ? "image" : source)


Preserve Markdown image alt text in DOCX drawings

When Markdown contains meaningful alt text such as ![Sales chart](chart.png) with matching image data, this derives the drawing name from the source filename and emits no descr; WordConverter.imageAltText reads descr first and then name, so re-importing the generated DOCX returns the filename as the alt text instead of Sales chart. Keep the filename for the media target, but write the escaped alt into the drawing alt-text attributes (at least wp:docPr descr).

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T20:59:41Z

+    static func escape(_ text: String) -> String {
+        text.replacingOccurrences(of: "&", with: "&amp;")
+            .replacingOccurrences(of: "<", with: "&lt;")
+            .replacingOccurrences(of: ">", with: "&gt;")


Filter XML-invalid scalars before writing OOXML text

For write(markdown:) input containing XML 1.0-invalid control scalars (for example NUL or vertical tab), this helper only escapes markup characters and leaves those scalars in document.xml, worksheet XML, or slide XML. The ZIP can still be produced, but the XML parts are not well-formed and Office/readers will reject the output; because raw Markdown writes do not run sanitizeUnicode, the OOXML escaping path should drop or replace disallowed XML characters before serialization.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T20:59:42Z

+    private static func parseLinkOrImage(_ chars: [Character], from: Int, isImage: Bool) -> (node: MarkdownInline, next: Int)? {
+        let bracket = isImage ? from + 1 : from
+        guard bracket < chars.count, chars[bracket] == "[" else { return nil }
+        guard let labelEnd = firstIndex(of: "]", in: chars, from: bracket + 1) else { return nil }


Honor escaped label delimiters when parsing links

Canonical Markdown from WordConverter escapes [/] in link labels and image alt text, but this delimiter search treats an escaped \] as the end of the label. When transcoding a DOCX link or image whose visible text contains ], the parser fails to recognize the link/image and the exporter writes literal Markdown instead of the original object; skip backslash-escaped delimiters and unescape the label before building the inline node.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T20:59:42Z

+            guard let parenClose = firstIndex(of: ")", in: chars, from: destStart) else { return nil }
+            dest = String(chars[destStart..<parenClose])


Handle balanced parentheses in link destinations

For raw Markdown passed to write(markdown:) with a common URL such as [spec](https://example.com/Foo_(bar)), this stops the destination at the first ), so the generated DOCX relationship target is truncated and the remaining ) is emitted as ordinary text. Since this API is advertised for LLM Markdown output, parse balanced parentheses (or normalize to angle-bracket destinations) before creating office hyperlinks.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T20:59:42Z

+
+            case .list(let ordered, let items):
+                usedNumbering = true
+                let numId = ordered ? 2 : 1


Restart each ordered list with distinct numbering

When a Markdown document has two separate ordered lists with prose between them, every ordered list item is assigned the same numId and Word treats the second list as a continuation of the first instead of restarting at 1. Allocate a fresh numbering instance (or a start override) per ordered-list block so exported DOCX files preserve Markdown list boundaries in Word/Pages.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T20:59:42Z

+        if result.markdown().trimmingCharacters(in: .whitespacesAndNewlines).isEmpty,
+           !result.sections.contains(where: { $0.kind == .image }) {


Render image-only results instead of blank files

This exception lets a ConverterResult that contains only .image sections pass validation, but the built-in exporters build their visible content from result.markdown() (which excludes .image carriers) and only embed images that are referenced inline. Passing an image-only result therefore produces a blank DOCX/PPTX/RTF or placeholder XLSX instead of the advertised valid image output; either synthesize image references for these sections or reject image-only results for formats that cannot render them.

Useful? React with 👍 / 👎.

- OOXMLPackageWriter.escape: drop XML 1.0-invalid control scalars so raw Markdown (which skips sanitizeUnicode) can't produce non-well-formed parts. - WordprocessingMLExporter: look images up by basename when the source has a path prefix; package/relate each media file once (no duplicate ZIP parts on repeated refs); write Markdown alt text into wp:docPr@descr so it round-trips; give each ordered list its own numbering instance so Word restarts it at 1. - XLSXExporter.sanitizeSheetName: rebuild via UnicodeScalarView so multi-scalar grapheme clusters (emoji) survive. - MarkdownInline: pre-compile emphasis regexes; honor backslash-escaped label delimiters and balanced parens in bare link/image destinations, unescaping the recovered label/destination.

- MarkdownInline: store only the compiled emphasis regexes in static state (nonisolated(unsafe), immutable/thread-safe) and keep the non-Sendable wrap closures local to parseEmphasis — fixes the Swift 6 MutableGlobalVariable build error. - PicoDocsEngine.write: for image-only results (only .image carriers, empty markdown), synthesize an inline ![alt](name) reference per carrier so the images are embedded instead of producing a blank document. - Tests: image-only DOCX embeds its image; parser handles balanced parens and escaped label delimiters.

ronaldmannak · 2026-06-18T22:10:00Z

Thanks @gemini-code-assist and @chatgpt-codex-connector — worked through all the review threads. Pushed in be8f6a1 and 66d1511. Summary of each:

Fixed

XML 1.0-invalid control chars (Gemini HIGH + Codex P2, OOXMLPackageWriter.escape): now filters disallowed scalars (NUL, vertical tab, etc.) before escaping, so raw write(markdown:) input — which skips sanitizeUnicode — can't produce non-well-formed document.xml/worksheet/slide parts.
Image lookup by basename (Gemini, WordprocessingMLExporter): falls back to the source's last path component when it carries a path prefix.
Sheet-name grapheme clusters (Gemini, XLSXExporter): rebuilds via UnicodeScalarView so emoji / multi-scalar clusters survive sanitization.
Pre-compiled emphasis regexes (Gemini, MarkdownInline): the three matchers are compiled once instead of per recursive call.
Duplicate media parts (Codex P2): repeated references to the same image now reuse a single media part + relationship (unique drawing ids), avoiding duplicate ZIP entries.
Alt text in DOCX drawings (Codex P2): Markdown alt text is written to wp:docPr/@descr, which WordConverter.imageAltText reads first, so it round-trips instead of collapsing to the filename.
Escaped label delimiters (Codex P2, MarkdownInline): the label scan skips backslash-escaped \] and unescapes the recovered label.
Balanced parens in destinations (Codex P2): bare link/image destinations match balanced parentheses, so https://e.com/Foo_(bar) isn't truncated.
Ordered-list restart (Codex P2): each ordered list gets its own numbering instance with a startOverride, so Word restarts separate lists at 1 instead of continuing.
Image-only results render blank (Codex P2, PicoDocs.write): image-only ConverterResults now synthesize an inline [alt](name) reference per .image carrier so the bytes are actually embedded, keeping the "image-only doc is valid output" contract honest.

Added tests covering image-only export, balanced-paren links, and escaped label delimiters. The Swift 6 build error from the first push (a non-Sendable closure in the static regex table) is fixed in 66d1511 by keeping only the compiled regexes in static storage.

Generated by Claude Code

gemini-code-assist · 2026-06-18T22:10:15Z

Thanks for the detailed summary of the fixes, @ronaldmannak. The approach to sanitizing XML scalars in OOXMLPackageWriter and the extraction of the MarkdownBlockParser and MarkdownInlineParser into shared IRs are excellent improvements for maintainability and correctness. The added tests for image-only exports and edge cases like balanced parentheses in links are also great to see. Everything looks solid and ready for review.

chatgpt-codex-connector · 2026-06-18T22:11:14Z

Summary

Verified that the PR head already contains the requested review-thread fixes, so I did not make additional code changes, create a commit, or open a new PR.
Confirmed XML 1.0-invalid scalars are filtered in the OOXML escaping choke point before XML entity escaping. Sources/PicoDocs/Exporters/OOXMLPackageWriter.swiftL69-L91
Confirmed DOCX image export now falls back to basename lookup, reuses one media relationship per distinct file, and writes Markdown alt text into wp:docPr/@descr. Sources/PicoDocs/Exporters/WordprocessingMLExporter.swiftL196-L241
Confirmed ordered lists now allocate separate numbering instances and write startOverride values so separate ordered lists restart at 1. Sources/PicoDocs/Exporters/WordprocessingMLExporter.swiftL113-L128 Sources/PicoDocs/Exporters/WordprocessingMLExporter.swiftL360-L386
Confirmed Markdown link/image parsing skips escaped label delimiters, unescapes labels/destinations, handles balanced parentheses, and keeps emphasis regexes precompiled. Sources/PicoDocs/Renderers/MarkdownInline.swiftL104-L134 Sources/PicoDocs/Renderers/MarkdownInline.swiftL137-L179 Sources/PicoDocs/Renderers/MarkdownInline.swiftL183-L205
Confirmed XLSX sheet name sanitization rebuilds from a UnicodeScalarView, preserving multi-scalar grapheme clusters. Sources/PicoDocs/Exporters/XLSXExporter.swiftL139-L148
Confirmed image-only ConverterResults synthesize inline image references before dispatching to exporters. Sources/PicoDocs/PicoDocs.swiftL107-L119 Sources/PicoDocs/PicoDocs.swiftL122-L136
Confirmed tests were added for balanced-parenthesis links and escaped label delimiters. Tests/PicoDocsTests/ExporterTests.swiftL225-L237

Testing

⚠️ swift test — could not complete because SwiftPM dependency fetches were blocked by the environment’s network/proxy (CONNECT tunnel failed, response 403).
✅ git status --short — clean working tree; no new changes to commit.

View task →

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 66d1511694

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T22:19:36Z

+            case .heading(_, let text):
+                rows.append([plain(text)])


Skip sheet-title headings when rebuilding XLSX rows

When writing a ConverterResult produced by SpreadsheetConverter, each .sheet section's Markdown starts with ## <sheetName> (see SpreadsheetConverter.markdownTable), while the sheet name is already carried separately in section.sheetName. Treating every heading as a row here means transcode(..., to: .xlsx) inserts the worksheet name into cell A1 and shifts the original spreadsheet data down by one row, corrupting round-tripped XLSX content. Skip the section-title heading for sheet sections or prefer a lossless row source for spreadsheet-derived sections.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T22:19:36Z

+            let name = (section.sourcePath as NSString?)?.lastPathComponent ?? section.title
+            guard let name, !name.isEmpty else { return nil }


Generate a fallback name for unnamed image carriers

The new synthesis path still drops any .image carrier that lacks both sourcePath and title, even though those fields are optional and write(_:) treats any image section as valid non-empty input. In that case an image-only result with valid metadata["base64"]/metadata["mimeType"] passes validation, refs stays empty, and the exporters receive no inline reference to the bytes, so the call succeeds with a blank/placeholder document instead of rendering the image; synthesize a stable filename from the index and MIME type instead of returning nil.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T22:19:36Z

+    /// reports `false` and `write(...)` throws `unableToExportToRequestedFormat`.
+    public var isImplemented: Bool {
+        switch self {
+        case .docx, .rtf, .xlsx, .pptx: return true


Make RTF support reflect the current platform

On non-Apple platforms the default registry does not register AttributedStringRTFExporter because it is inside #if canImport(AppKit) || canImport(UIKit), so .rtf writes throw unableToExportToRequestedFormat; however isImplemented still reports .rtf as supported. Server/Linux callers that use this property to advertise available export formats will offer an option that the built-in writer cannot fulfill, so gate .rtf the same way the registry does or document a separate availability API.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T22:19:37Z

+            let name = (section.sourcePath as NSString?)?.lastPathComponent ?? section.title
+            guard let name, !name.isEmpty else { continue }
+            index[name] = data


Preserve path-qualified image identities in DOCX export

When callers provide a structured result with two image sections such as charts/logo.png and headers/logo.png and matching inline refs, this index collapses both carriers to the same basename key and the later section overwrites the earlier bytes. Both drawings then resolve to the same media part even though imageRun tries to support path-prefixed sources, so distinct images with equal basenames are rendered incorrectly; index by source path as well as basename, and only fall back to basename when it is unambiguous.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-18T22:19:37Z

+            if ext.isEmpty {
+                ext = OfficeMediaType.fileExtension(forMIME: OfficeMediaType.mimeType(forExtension: ext))
+                filename += ".\(ext)"


Use image MIME metadata when adding missing extensions

For an embedded image whose reference/name has no extension but whose .image section carries metadata["mimeType"] (for example image/png), this derives the fallback extension by first asking for the MIME type of an empty extension, which always becomes application/octet-stream, and then maps that to .bin. The resulting DOCX stores word/media/<name>.bin with an octet-stream content type, so Office cannot reliably recognize or render otherwise valid PNG/JPEG bytes; carry the section MIME through the image index and use it here when ext.isEmpty.

Useful? React with 👍 / 👎.

- XLSXExporter.rows: drop a leading heading that echoes the section's sheetName/title so SpreadsheetConverter-derived sheets (which prefix `## <name>`) don't push the worksheet name into A1 and shift data down a row. - ExportableFileType.isImplemented: gate `.rtf` behind canImport(AppKit)||canImport(UIKit) so it doesn't advertise an export the (Apple-only) RTF writer can't fulfill on Linux. - WordprocessingMLExporter image index: rebuild as a struct carrying a unique, correctly-typed media filename per carrier; key by full source path and by unambiguous basename so same-basename images from different paths stay distinct, and derive missing extensions from the carrier MIME (no more .bin/octet-stream). - PicoDocsEngine.withSynthesizedImageReferences: generate a stable image-<n>.<ext> name (ext from MIME) for unnamed carriers and assign it as sourcePath so the exporter index resolves it — image-only results with MIME-only carriers now embed. - Tests: XLSX sheet-title not a data row; same-basename images stay distinct; unnamed MIME-only image embeds as .png; .rtf availability matches platform.

ronaldmannak · 2026-06-19T06:05:00Z

Second review round (@chatgpt-codex-connector) addressed in 4676e9b — all 5 P2 items confirmed as real correctness gaps and fixed:

XLSX sheet-title becomes a data row (XLSXExporter.rows): SpreadsheetConverter prefixes each sheet with ## <sheetName> and carries the name in sheetName but sets no csv metadata, so the flatten path put the title in A1 and shifted data down. Now a leading heading that echoes sheetName/title is dropped (genuine in-body headings stay).
.rtf over-reports availability (ExportableFileType.isImplemented): the RTF writer is registered only under #if canImport(AppKit) || canImport(UIKit), so .rtf is now gated the same way and Linux callers no longer advertise an export that throws.
Same-basename images collapse (WordprocessingMLExporter image index): rebuilt as a struct assigning each carrier a unique, correctly-typed media filename; keyed by full source path and by unambiguous basename, so charts/logo.png and headers/logo.png write two distinct parts (logo.png, logo-2.png).
Wrong extension for MIME-only images: the missing extension is now taken from the carrier's MIME (image/png → .png) at index-build time, so valid bytes are no longer stored as .bin/octet-stream.
Unnamed image carriers dropped (withSynthesizedImageReferences): a carrier with only base64/mimeType now gets a stable image-<n>.<ext> name (ext from MIME), assigned as its sourcePath so the exporter index resolves the same key — image-only results with such carriers embed instead of producing a blank document.

Tests added: XLSX sheet title isn't a data cell; same-basename images stay distinct; an unnamed MIME-only image embeds as .png; .rtf availability matches the platform gate.

Build (macOS, Swift 6) is running on 4676e9b; I'll follow up if it's not green.

Generated by Claude Code

chatgpt-codex-connector · 2026-06-19T06:06:12Z

Summary

Verified the PR head is already at 4676e9b4e37e803ea9fa65784370e73a365bc795 and the working tree is clean, so I did not make additional code changes, commit, or create a new PR.
Confirmed XLSX row rebuilding drops only a leading heading that matches sheetName/title, preventing spreadsheet sheet titles from becoming cell A1 while preserving genuine in-body headings. Sources/PicoDocs/Exporters/XLSXExporter.swiftL53-L67
Confirmed image-only ConverterResults synthesize inline image references, including stable MIME-derived fallback filenames for unnamed carriers. Sources/PicoDocs/PicoDocs.swiftL114-L145
Confirmed .rtf availability is gated by canImport(AppKit) || canImport(UIKit), matching the default registry’s platform behavior. Sources/PicoDocs/Models/ExportableFileType.swiftL40-L56
Confirmed DOCX image indexing now preserves path-qualified identities, uses MIME metadata for missing extensions, creates unique media filenames, and only falls back to basename when unambiguous. Sources/PicoDocs/Exporters/WordprocessingMLExporter.swiftL47-L110
Confirmed regression tests cover sheet-title skipping, distinct same-basename DOCX images, and unnamed MIME-only image embedding. Tests/PicoDocsTests/ExporterTests.swiftL219-L239 Tests/PicoDocsTests/ExporterTests.swiftL242-L258 Tests/PicoDocsTests/ExporterTests.swiftL260-L268

Testing

⚠️ swift test — could not complete because SwiftPM dependency fetches were blocked by the environment proxy (CONNECT tunnel failed, response 403).
✅ git status --short — clean working tree; no new changes to commit.

View task →

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d8f9504801

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-19T06:18:14Z

+            var name = (section.sourcePath as NSString?)?.lastPathComponent ?? section.title
+            if name?.isEmpty ?? true {
+                generatedCount += 1
+                let ext = OfficeMediaType.fileExtension(forMIME: section.metadata["mimeType"] ?? "")
+                let generated = "image-\(generatedCount).\(ext)"
+                sections[index].sourcePath = generated
+                name = generated
+            }
+            guard let name, !name.isEmpty else { continue }
+            let alt = section.title ?? name
+            refs.append(DocumentSection(kind: .body, markdown: "![\(alt)](\(name))"))


Preserve paths in synthesized image refs

When an image-only result contains two carriers with the same basename under different sourcePaths, this synthesis strips both references down to that basename (for example charts/logo.png and headers/logo.png both become ![logo.png](logo.png)). The DOCX image index intentionally drops ambiguous basename lookups and only keeps the full paths, so these synthesized references no longer resolve and the exported DOCX falls back to text instead of embedding either image; the fresh evidence is this new synthesis path using lastPathComponent before creating the Markdown ref.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-19T06:18:14Z

+            case .table(let tableRows):
+                rows += tableRows


Strip inline Markdown from XLSX table cells

For table rows this appends the parsed cell text directly, unlike headings, paragraphs, and list items which go through plain(_). When transcoding a DOCX/Markdown table cell with inline formatting such as **Total** or a link label, the generated XLSX cell contains the Markdown syntax instead of the visible cell value, corrupting spreadsheet data; map table cells through the same plain-text projection before writing rows.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-19T06:18:14Z

+        if case .heading(_, let text)? = blocks.first,
+           let title = section.sheetName ?? section.title,
+           plain(text) == title {
+            blocks.removeFirst()


Limit sheet-title stripping to sheet sections

This removes any leading heading whose text matches section.sheetName ?? section.title, even for non-spreadsheet sections. A normal titled body/chapter such as DocumentSection(title: "Intro", kind: .body, markdown: "# Intro\n\n...") exported to XLSX loses the Intro row, while only .sheet sections from SpreadsheetConverter have the redundant sheet-title heading that needs dropping.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-19T06:18:14Z

+            if case .heading(let level, let text) = block, level <= 2 {
+                flush()
+                title = text


Normalize PPTX heading titles before emitting

For Markdown split into slides, the slide title is taken directly from the heading text and later XML-escaped only, while body lines go through plain(_). A heading like # **Q4** results or # [Spec](https://example.com) therefore produces a PPTX title showing literal Markdown syntax instead of the visible title text; run the heading through the same inline plain-text projection or emit rich text runs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-19T06:18:14Z

+            case .footnoteReference:
+                break   // references carry no inline glyph in this projection


Preserve footnote markers in RTF exports

On Apple platforms the default RTF writer goes through this builder, and any Markdown/converted DOCX containing [^id] markers loses them here. The body reference is dropped entirely, and a footnote definition like [^fn1]: note becomes just : note, so the exported RTF no longer preserves where the note was referenced or which definition it belongs to; emit a literal marker as the DOCX writer does instead of skipping it.

Useful? React with 👍 / 👎.

gemini-code-assist Bot reviewed Jun 18, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

ronaldmannak added 2 commits June 18, 2026 22:04

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

Re-run CI (flaky iWork reader fixture-count tests; no code change)

d8f9504

chatgpt-codex-connector Bot reviewed Jun 19, 2026

View reviewed changes

		private func imageRun(alt: String, source: String) -> String? {
		guard let data = images[source] else { return nil }

		guard let parenClose = firstIndex(of: ")", in: chars, from: destStart) else { return nil }
		dest = String(chars[destStart..<parenClose])

		if result.markdown().trimmingCharacters(in: .whitespacesAndNewlines).isEmpty,
		!result.sections.contains(where: { $0.kind == .image }) {

		let name = (section.sourcePath as NSString?)?.lastPathComponent ?? section.title
		guard let name, !name.isEmpty else { return nil }

		case .footnoteReference:
		break // references carry no inline glyph in this projection

Uh oh!

Conversation

ronaldmannak commented Jun 18, 2026

Summary

Key Changes

Notable Implementation Details

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

ronaldmannak commented Jun 18, 2026

Uh oh!

gemini-code-assist Bot commented Jun 18, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 18, 2026

Summary

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

ronaldmannak commented Jun 19, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 19, 2026

Summary

Uh oh!

chatgpt-codex-connector Bot left a comment