Openize.MarkItDown for Python is a package that converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with LLMs for extended processing including OpenAI, Claude, Gemini, and Mistral.
- Convert
.docx,.pdf,.xlsx, and.pptxto Markdown. - Save Markdown files locally or send them to an LLM for processing (OpenAI, Claude, Gemini, Mistral).
- Structured with the Factory & Strategy Pattern for scalability.
- Works with Windows and Linux-compatible paths.
- Command-line interface for easy use.
This package depends on the Aspose libraries, which are commercial products:
You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms.
LLM support requires valid API keys and potentially the following dependencies:
openaifor OpenAIanthropicfor Clauderequestsfor Gemini and Mistral REST APIs
pip install openize-markitdown-pythongit clone https://github.com/openize-com/openize-markitdown-python.git
cd openize-markitdown-python\packages\markitdown
pip install -e . --verboseYou can use the markitdown CLI to convert a single document or a folder of documents into Markdown format. You can also optionally send the converted Markdown to a supported LLM for post-processing.
markitdown --input-file d:\test.docx -o d:\output_foldermarkitdown --input-dir d:\docs -o d:\output_folderThese commands require you to set the appropriate environment variables or provide API keys when prompted:
markitdown --input-file d:\test.docx -o d:\output_folder --llm openai
markitdown --input-file d:\test.docx -o d:\output_folder --llm claude
markitdown --input-file d:\test.docx -o d:\output_folder --llm gemini
markitdown --input-file d:\test.docx -o d:\output_folder --llm mistralfrom openize.markitdown.core import MarkItDown
# Define input file and output directory
input_file = "report.pdf"
output_dir = "output_markdown"
# Create MarkItDown instance with desired LLM
converter = MarkItDown(output_dir, llm_client_name="mistral")
# Convert document and send output to LLM
converter.convert_document(input_file)
print("Conversion completed and data sent to LLM.")| Variable | Description |
|---|---|
ASPOSE_LICENSE_PATH |
Path to Aspose license file (required if using paid features) |
OPENAI_API_KEY |
API key for OpenAI integration |
OPENAI_MODEL |
(Optional) Model name for OpenAI (default: gpt-4) |
CLAUDE_API_KEY |
API key for Claude integration |
CLAUDE_MODEL |
(Optional) Model name for Claude (default: claude-v1) |
GEMINI_API_KEY |
API key for Gemini integration |
GEMINI_MODEL |
(Optional) Model name for Gemini (default: gemini-pro) |
MISTRAL_API_KEY |
API key for Mistral integration |
MISTRAL_MODEL |
(Optional) Model name for Mistral (default: mistral-medium) |
Unix-based systems:
export ASPOSE_LICENSE_PATH="/path/to/license"
export OPENAI_API_KEY="your-openai-key"
export CLAUDE_API_KEY="your-claude-key"
export GEMINI_API_KEY="your-gemini-key"
export MISTRAL_API_KEY="your-mistral-key"Windows (PowerShell):
$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
$env:OPENAI_API_KEY = "your-openai-key"
$env:CLAUDE_API_KEY = "your-claude-key"
$env:GEMINI_API_KEY = "your-gemini-key"
$env:MISTRAL_API_KEY = "your-mistral-key"We appreciate your interest in contributing to this project! To ensure a smooth collaboration, please follow these steps when submitting a pull request:
- Fork & Clone – Fork the repository and clone it to your local machine.
- Create a Branch – Use a new branch for your contribution.
- Sign the Contributor License Agreement (CLA) – Before your first contribution can be accepted, you must sign our CLA via CLA Assistant. You will be prompted to sign it when submitting your first pull request. You can also review the CLA here: https://cla.openize.com/agreement.
- Submit a Pull Request (PR) – Once your changes are ready, open a PR with a clear description.
- Review & Feedback – Our maintainers will review your PR and provide feedback if needed.
By contributing, you agree to the terms of the CLA and confirm that your changes comply with the project's licensing policies.
This package is licensed under the MIT License. However, it depends on Aspose libraries, which are proprietary, closed-source libraries.