Convert PDF, DOCX, and image files to Markdown using AI. This CLI tool extracts text, images, and preserves table structure while converting documents to clean, well-formatted Markdown. It also supports OCR text extraction from images.
- PDF Support - Full text extraction, image extraction, and page screenshots for layout understanding
- DOCX Support - Text and image extraction with structure preservation
- Image OCR - Extract text from images (PNG, JPG, JPEG, GIF, WEBP) using AI-powered OCR
- AI-Powered Conversion - Uses Google's Gemini AI to intelligently convert content to Markdown
- Interactive CLI - Friendly prompts using clack.js
- Easy Setup - Built-in configuration wizard for API keys
npx f2md document.pdfbunx f2md document.pdfpnpm dlx f2md document.pdfnpm install -g f2md
# or
bun install -g f2mdBefore using the tool, you need to configure your Google AI API key.
f2md setup
# or with npx
npx f2md setupThe setup wizard will:
- Show you where to get a Google AI API key (https://aistudio.google.com/apikey)
- Prompt you to enter your API key
- Ask where to save it (local project or global for all projects)
Alternatively, set the environment variable:
export GOOGLE_GENERATIVE_AI_API_KEY="your-api-key-here"Or create a .env file in your project:
GOOGLE_GENERATIVE_AI_API_KEY=your-api-key-here
f2mdThe tool will prompt you for:
- Input file path (PDF, DOCX, or image)
- Output file path
# Convert with auto-generated output name
f2md document.pdf
# Convert with custom output path
f2md document.pdf output.md
# Extract text from an image (OCR)
f2md screenshot.png
# Extract text from image with custom output
f2md image.jpg output.md- PDF (
.pdf) - Word Documents (
.docx) - Images (
.png,.jpg,.jpeg,.gif,.webp) - OCR text extraction
f2md --help # Show help
f2md --version # Show version
f2md setup # Configure API key- Extraction - Reads the input file and extracts text, images, and layout information
- Processing - For PDFs, captures page screenshots to understand visual layout
- AI Conversion - Sends extracted content to Google's Gemini AI model
- Markdown Generation - Receives AI-generated Markdown with proper formatting
- Cleanup - Removes unused images and saves the final output
- Image Processing - Reads the image file and encodes it for AI processing
- OCR Analysis - Sends the image to Google's Gemini AI with specialized prompts for text extraction
- Text Extraction - AI extracts all visible text while preserving structure (headings, lists, tables)
- Markdown Generation - Converts extracted content to well-formatted Markdown
- Output - Saves the final Markdown file
- Bun installed
# Clone the repository
git clone <repo-url>
cd f2md
# Install dependencies
bun install
# Run in development mode
bun run devbun run buildsrc/
cli.ts - CLI entry point with clack prompts
convert.ts - Core conversion logic
index.ts - Public API exports
dist/ - Built output (generated)
You can also use this as a library in your Node.js/Bun projects:
import { convert } from "f2md";
const result = await convert("input.pdf", "output.md", {
onProgress: (message) => console.log(message),
respectPages: false,
});
console.log(`Saved to: ${result.outputPath}`);
console.log(`Images saved: ${result.imagesSaved}`);
console.log(`Images cleaned: ${result.imagesDeleted}`);MIT