Skip to content

AnsonLai/docx-redline-js

Repository files navigation

@ansonlai/docx-redline-js

Host-independent OOXML reconciliation engine for .docx manipulation with track changes (redlines).

Converts AI-generated or programmatic text/markdown edits into valid Office Open XML (OOXML) with w:ins/w:del revision markup that Microsoft Word renders as native tracked changes.

Features

  • Text reconciliation with word-level diffing and native-looking redlines
  • Formatting updates (bold, italic, underline, strikethrough) via surgical w:rPrChange
  • Lists: generate and edit real Word lists (w:numPr) from markdown
  • Tables: virtual-grid diffing for cell-level edits with merge safety
  • Comments: inject OOXML comments anchored to text ranges
  • Revision management: accept/reject tracked changes by author or for all authors
  • Comment management: delete comments by author or for all authors
  • Highlights: apply highlight colors to runs
  • Markdown and OOXML conversion in both directions
  • Package plumbing helpers for numbering.xml, comments.xml, content types, and relationships
  • Zero host dependencies: works in Node.js, browsers, Deno, and similar JS runtimes with DOM parsing support

Install

npm / Node.js

npm install @ansonlai/docx-redline-js

CDN (browser <script type="module">)

<script type="module">
  import { applyRedlineToOxml } from 'https://esm.sh/@ansonlai/docx-redline-js';
</script>

Or use the pre-bundled file (no import map needed, diff-match-patch is inlined):

<script type="module">
  import { applyRedlineToOxml } from 'https://cdn.jsdelivr.net/npm/@ansonlai/docx-redline-js/dist/docx-redline-js.esm.min.js';
</script>

Local git clone

git clone https://github.com/AnsonLai/docx-redline-js.git
import { applyRedlineToOxml } from './docx-redline-js/index.js';

Quick Start

Node.js

import { DOMParser, XMLSerializer } from '@xmldom/xmldom';
import {
  configureXmlProvider,
  setDefaultAuthor,
  applyRedlineToOxml
} from '@ansonlai/docx-redline-js';

configureXmlProvider({ DOMParser, XMLSerializer });
setDefaultAuthor('My App');

const result = await applyRedlineToOxml(
  paragraphOoxml,
  'Original sentence.',
  'Updated sentence.',
  { generateRedlines: true, author: 'Editor' }
);

console.log(result.hasChanges);
console.log(result.oxml);

Browser

import {
  setDefaultAuthor,
  applyRedlineToOxml
} from '@ansonlai/docx-redline-js';

setDefaultAuthor('Browser Editor');

const result = await applyRedlineToOxml(oxml, original, modified, {
  generateRedlines: true
});

API Reference

Configuration (call once at startup)

Function Purpose
configureXmlProvider({ DOMParser, XMLSerializer }) Inject XML parser. Required in Node.js; browsers usually provide native support.
configureLogger({ log, warn, error }) Replace default console logger.
setDefaultAuthor(name) Set fallback track-change author (default: 'Author').
setPlatform(label) Set platform label for diagnostics (default: 'Unknown').

Engine (primary reconciliation APIs)

Function Purpose
applyRedlineToOxml(oxml, original, modified, options) Core engine entry point for text/markdown reconciliation with optional redlines.
applyRedlineToOxmlWithListFallback(oxml, original, modified, options) Core engine with automatic single-line list structural fallback.
reconcileMarkdownTableOoxml(oxml, original, markdownTable, options) Table-specific reconciliation helper.

Pipeline (lower-level access)

Function Purpose
ReconciliationPipeline Direct pipeline access (ingest, diff, patch, serialize).
ingestWordOoxmlToPlainText(oxml) Extract plain text from OOXML.
ingestWordOoxmlToMarkdown(oxml) Convert OOXML to markdown.
ingestOoxml(oxml) Flatten OOXML into an internal run model with offsets.
preprocessMarkdown(text) Normalize markdown and extract format hints.

Services

Function Purpose
injectCommentsIntoOoxml(oxml, comments, options) Add comments anchored to text ranges.
acceptTrackedChangesInOoxml(oxml, { author?, allAuthors? }) Accept w:ins / w:del / *PrChange revisions for one author or all authors.
rejectTrackedChangesInOoxml(oxml, { author?, allAuthors? }) Reject w:ins / w:del / *PrChange revisions for one author or all authors.
deleteCommentsByAuthorInOoxml(oxml, { author?, allAuthors? }) Delete comments and matching anchors/references for one author or all authors.
generateTableOoxml(headers, rows, options) Generate a w:tbl from tabular data.
createDynamicNumberingIdState(numberingXml) Allocate numbering IDs without collisions.
ensureNumberingArtifactsInZip(zip, numberingXml) Merge numbering artifacts into a .docx package.
ensureCommentsArtifactsInZip(zip, commentsXml) Merge comments artifacts into a .docx package.
validateDocxPackage(zip) Validate .docx structural consistency.

Deep Imports

For advanced usage, import specific submodules:

import { applyOperationToDocumentXml } from '@ansonlai/docx-redline-js/services/standalone-operation-runner.js';
import { getParagraphText } from '@ansonlai/docx-redline-js/core/paragraph-targeting.js';

Output Shape Matrix

Different APIs return different OOXML shapes. Use this as a packaging safety check.

API Typical input scope Output field Possible root/output shape Safe to write directly into word/document.xml
applyRedlineToOxml(...) Paragraph, range, or table-scope OOXML result.oxml Fragment, <w:document>, or package payload (<pkg:package>) No. Inspect first.
applyRedlineToOxmlWithListFallback(...) Paragraph or range-scope OOXML result.oxml Fragment, <w:document>, or package payload (<pkg:package>) No. Inspect first.
reconcileMarkdownTableOoxml(...) Table or paragraph-scope OOXML result.oxml Same shapes as applyRedlineToOxml(...) for the supplied scope No. Inspect first.
applyOperationToDocumentXml(...) Full word/document.xml string result.documentXml <w:document> Yes. This is the document-safe helper.
extractReplacementNodesFromOoxml(...) Any OOXML payload { replacementNodes, numberingXml, sourceType } Normalized to fragment, document, or package Yes. Use this when consuming result.oxml.

Do / Don't for Packaging

  • Do use applyOperationToDocumentXml(...).documentXml when your intent is to replace word/document.xml.
  • Do use extractReplacementNodesFromOoxml(...) when you are consuming result.oxml from paragraph/range/table APIs.
  • Do merge numbering/comments artifacts with ensureNumberingArtifactsInZip(...) and ensureCommentsArtifactsInZip(...) when those parts are present.
  • Don't write payloads that start with <pkg:package directly into word/document.xml.
  • Don't assume every result.oxml payload is a raw paragraph fragment.

Working With .docx Files

This package operates on OOXML strings (XML parts inside .docx zip archives), not raw .docx binaries.

Typical flow:

  1. Extract the .docx zip (for example with JSZip, fflate, or similar)
  2. Read word/document.xml
  3. Apply reconciliation APIs to XML strings
  4. Merge numbering/comments artifacts when needed
  5. Write the archive back to a .docx file
import JSZip from 'jszip';
import {
  applyRedlineToOxml,
  extractReplacementNodesFromOoxml,
  ensureNumberingArtifactsInZip,
  validateDocxPackage
} from '@ansonlai/docx-redline-js';
import { applyOperationToDocumentXml } from '@ansonlai/docx-redline-js/services/standalone-operation-runner.js';

const zip = await JSZip.loadAsync(docxBuffer);
const documentXml = await zip.file('word/document.xml').async('string');

const opResult = await applyOperationToDocumentXml(
  documentXml,
  { type: 'redline', target: 'old text', modified: 'new text' },
  'Editor'
);

// applyOperationToDocumentXml(...) returns a full w:document payload.
zip.file('word/document.xml', opResult.documentXml);

const fragmentResult = await applyRedlineToOxml(
  paragraphOoxml,
  'Item text',
  '1. Item text',
  { generateRedlines: true, author: 'Editor' }
);
const normalized = extractReplacementNodesFromOoxml(fragmentResult.oxml);

// If sourceType === 'package', merge extracted content/artifacts instead of
// writing the raw pkg:package payload into word/document.xml.
if (normalized.numberingXml) {
  await ensureNumberingArtifactsInZip(zip, normalized.numberingXml);
}

await validateDocxPackage(zip);
const output = await zip.generateAsync({ type: 'nodebuffer' });

Architecture

See ARCHITECTURE.md for module layout, data flow, and contributor guidance.

See AGENTS.md for a concise reference for AI coding agents.