Skip to content

Latest commit

 

History

History
309 lines (228 loc) · 8.01 KB

File metadata and controls

309 lines (228 loc) · 8.01 KB

@jonahschulte/rtf-toolkit

Modern TypeScript RTF parser with track changes support

License: MIT TypeScript Tests

A production-grade RTF parsing library for JavaScript/TypeScript with comprehensive track changes support. Perfect for government contracts, legal documents, and any application requiring RTF document analysis.

✨ Features

  • Full RTF Parsing - Handles RTF 1.9.1 specification
  • Track Changes Support - Parse insertions, deletions, and author metadata
  • HTML Conversion - Clean, semantic HTML output
  • TypeScript First - Full type safety and IntelliSense
  • Zero Dependencies - Lightweight core library
  • 100% Tested - 94 comprehensive unit tests
  • Visual Track Changes - HTML rendering with color-coded changes

Installation

npm install @jonahschulte/rtf-toolkit

Quick Start

Basic RTF to HTML Conversion

import { parseRTF, toHTML } from '@jonahschulte/rtf-toolkit';

// Parse RTF
const rtf = '{\\rtf1\\b Bold\\b0 and \\i italic\\i0 text}';
const doc = parseRTF(rtf);

// Convert to HTML
const html = toHTML(doc);
console.log(html);
// Output: <div class="rtf-content">
//           <p><strong>Bold</strong> and <em>italic</em> text</p>
//         </div>

Track Changes (Revisions)

import { parseRTF, getTrackChanges, getTrackChangeMetadata } from '@jonahschulte/rtf-toolkit';

// Parse RTF with track changes
const rtf = `{\\rtf1
{\\*\\revtbl{Unknown;}{John Doe;}{Jane Smith;}}
Original text {\\revised\\revauth1 inserted by John} more text.
{\\deleted\\revauth2 removed by Jane} final text.}`;

const doc = parseRTF(rtf);

// Get all track changes
const changes = getTrackChanges(doc);
changes.forEach((change) => {
  console.log(`${change.type}: "${change.text}" by ${change.author}`);
});
// Output:
// insertion: "inserted by John" by John Doe
// deletion: "removed by Jane" by Jane Smith

// Get summary metadata
const metadata = getTrackChangeMetadata(doc);
console.log(`${metadata.totalChanges} changes by ${metadata.authors.length} authors`);
// Output: 2 changes by 2 authors

HTML with Track Changes Visualization

import { parseRTF, toHTML } from '@jonahschulte/rtf-toolkit';

const rtfWithChanges = `{\\rtf1
{\\*\\revtbl{Unknown;}{Editor;}}
Text with {\\revised\\revauth1 new content} here.}`;

const doc = parseRTF(rtfWithChanges);
const html = toHTML(doc);

console.log(html);
// Output includes:
// <span class="rtf-revision-inserted"
//       style="background-color: #d4edda; border-bottom: 2px solid #28a745;"
//       data-revision-type="insertion"
//       data-author="Editor">new content</span>

API Reference

Parsing

parseRTF(rtf: string): RTFDocument

Parse an RTF string into an Abstract Syntax Tree (AST).

const doc = parseRTF(rtfString);
console.log(doc.rtfVersion); // 1
console.log(doc.charset); // 'ansi'
console.log(doc.fontTable); // Array of fonts
console.log(doc.colorTable); // Array of colors
console.log(doc.content); // Array of paragraphs

Rendering

toHTML(doc: RTFDocument, options?: HTMLOptions): string

Convert RTF document AST to HTML.

const html = toHTML(doc, {
  includeWrapper: true, // Wrap in <div class="rtf-content">
});

HTMLOptions:

  • includeWrapper?: boolean - Wrap output in container div (default: true)
  • useClasses?: boolean - Use CSS classes instead of inline styles
  • classPrefix?: string - Custom CSS class prefix

Track Changes

getTrackChanges(doc: RTFDocument): TrackChange[]

Extract all track changes from the document.

const changes = getTrackChanges(doc);

changes.forEach((change) => {
  console.log(change.id); // Unique identifier
  console.log(change.type); // 'insertion' | 'deletion' | 'formatting'
  console.log(change.author); // Author name
  console.log(change.authorIndex); // Author index in revision table
  console.log(change.text); // Change content
  console.log(change.timestamp); // Date object
  console.log(change.position); // { paragraphIndex, characterOffset }
});

getTrackChangeMetadata(doc: RTFDocument): TrackChangeMetadata

Get summary statistics about track changes.

const metadata = getTrackChangeMetadata(doc);

console.log(metadata.totalChanges); // Total number of changes
console.log(metadata.insertions); // Number of insertions
console.log(metadata.deletions); // Number of deletions
console.log(metadata.authors); // Array of unique author names
console.log(metadata.hasRevisions); // Boolean flag

RTF Features Supported

Document Structure

  • ✅ RTF header (\rtf1, \ansi, \deff)
  • ✅ Font table (\fonttbl) with font families
  • ✅ Color table (\colortbl) with RGB values
  • ✅ Revision table (\revtbl) with author names

Character Formatting

  • ✅ Bold (\b)
  • ✅ Italic (\i)
  • ✅ Underline (\ul)
  • ✅ Font size (\fs)
  • ✅ Font family (\f)
  • ✅ Text color (\cf)
  • ✅ Background color (\cb)

Paragraph Formatting

  • ✅ Alignment (\qc, \qr, \ql, \qj)
  • ✅ Spacing before/after (\sb, \sa)
  • ✅ Indentation (\li, \ri, \fi)
  • ✅ Paragraph breaks (\par)

Track Changes

  • ✅ Revision table parsing
  • ✅ Insertions (\revised)
  • ✅ Deletions (\deleted)
  • ✅ Author tracking (\revauth)
  • ✅ Timestamps (\revdttm)
  • ✅ Visual HTML rendering

Special Characters

  • ✅ Hex escapes (\'XX)
  • ✅ Unicode characters (\u)
  • ✅ Control symbols (\~, \-, \_)
  • ✅ Literal escapes (\\, \{, \})

Use Cases

Government Contracts & Legal Documents

// Parse contract with redlines
const contract = parseRTF(governmentContractRTF);
const changes = getTrackChanges(contract);

// Review all changes
changes.forEach((change) => {
  console.log(`${change.author} ${change.type}d: "${change.text}"`);
});

// Generate HTML for review
const html = toHTML(contract);
// Insertions shown in green, deletions in red with strikethrough

Document Review Workflows

// Get summary for review dashboard
const metadata = getTrackChangeMetadata(doc);

console.log(`Pending Review: ${metadata.totalChanges} changes`);
console.log(`Contributors: ${metadata.authors.join(', ')}`);

Content Migration

// Extract clean text without markup
const doc = parseRTF(rtfString);
const html = toHTML(doc);

// Use in modern CMS or web application

Examples

See the examples/ directory for complete demonstrations:

  • basic-usage.ts - Basic parsing and HTML conversion
  • track-changes-demo.ts - Track changes extraction and visualization

Development

# Install dependencies
npm install

# Run tests
npm test

# Build
npm run build

# Run examples
npm run build && node examples/track-changes-demo.js

TypeScript Support

Full TypeScript definitions included:

import type {
  RTFDocument,
  ParagraphNode,
  TextNode,
  RevisionNode,
  TrackChange,
  TrackChangeMetadata,
} from '@jonahschulte/rtf-toolkit';

// Fully typed API with IntelliSense support

Browser Support

Works in all modern browsers and Node.js:

  • ✅ Chrome/Edge (latest)
  • ✅ Firefox (latest)
  • ✅ Safari (latest)
  • ✅ Node.js 18+

Performance

Optimized for real-world documents:

  • Typical documents (<100KB): <100ms parsing
  • Large documents (1MB+): <1s parsing
  • Efficient memory usage
  • Streaming capable (future)

License

MIT © 2025 Jonah Schulte

Credits

Built to solve real-world government contract NDA management needs.

Open-sourced to help the community deal with legacy RTF systems.

Acknowledgments

  • RTF 1.9.1 Specification by Microsoft
  • Inspired by the limitations of existing RTF libraries

Contributing

Issues and PRs welcome! See GitHub Issues.

Star this repo if you find it useful! ⭐