You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Openize.MarkItDown for Python is a package that converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with LLMs for extended processing.
7
+
Openize.MarkItDown for Python is a package that converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with LLMs for extended processing including OpenAI, Claude, Gemini, and Mistral.
8
8
9
9
## Features
10
10
11
11
- Convert `.docx`, `.pdf`, `.xlsx`, and `.pptx` to Markdown.
12
-
- Save Markdown files locally or send them to an LLM for processing.
12
+
- Save Markdown files locally or send them to an LLM for processing (OpenAI, Claude, Gemini, Mistral).
13
13
- Structured with the **Factory & Strategy Pattern** for scalability.
14
14
- Works with Windows and Linux-compatible paths.
15
15
- Command-line interface for easy use.
@@ -24,17 +24,23 @@ This package depends on the Aspose libraries, which are commercial products:
24
24
25
25
You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms.
26
26
27
+
LLM support requires valid API keys and potentially the following dependencies:
|`ASPOSE_LICENSE_PATH`| Path to Aspose license file (required if using paid features) |
87
+
|`OPENAI_API_KEY`| API key for OpenAI integration |
88
+
|`OPENAI_MODEL`| (Optional) Model name for OpenAI (default: `gpt-4`) |
89
+
|`CLAUDE_API_KEY`| API key for Claude integration |
90
+
|`CLAUDE_MODEL`| (Optional) Model name for Claude (default: `claude-v1`) |
91
+
|`GEMINI_API_KEY`| API key for Gemini integration |
92
+
|`GEMINI_MODEL`| (Optional) Model name for Gemini (default: `gemini-pro`) |
93
+
|`MISTRAL_API_KEY`| API key for Mistral integration |
94
+
|`MISTRAL_MODEL`| (Optional) Model name for Mistral (default: `mistral-medium`) |
78
95
79
-
To set these variables:
96
+
### Setting Environment Variables
80
97
81
-
For Unix-based systems:
98
+
**Unix-based systems:**
82
99
83
100
```bash
84
101
export ASPOSE_LICENSE_PATH="/path/to/license"
85
-
export OPENAI_API_KEY="your-api-key"
86
-
export OPENAI_MODEL="gpt-4"
102
+
export OPENAI_API_KEY="your-openai-key"
103
+
export CLAUDE_API_KEY="your-claude-key"
104
+
export GEMINI_API_KEY="your-gemini-key"
105
+
export MISTRAL_API_KEY="your-mistral-key"
87
106
```
88
107
89
-
For Windows (PowerShell):
108
+
**Windows (PowerShell):**
90
109
91
110
```powershell
92
111
$env:ASPOSE_LICENSE_PATH = "C:\path\to\license"
93
-
$env:OPENAI_API_KEY = "your-api-key"
94
-
$env:OPENAI_MODEL = "gpt-4"
112
+
$env:OPENAI_API_KEY = "your-openai-key"
113
+
$env:CLAUDE_API_KEY = "your-claude-key"
114
+
$env:GEMINI_API_KEY = "your-gemini-key"
115
+
$env:MISTRAL_API_KEY = "your-mistral-key"
95
116
```
96
117
97
-
## Contributing
118
+
## Contributing
98
119
99
-
We appreciate your interest in contributing to this project! To ensure a smooth collaboration, please follow these steps when submitting a pull request:
120
+
We appreciate your interest in contributing to this project! To ensure a smooth collaboration, please follow these steps when submitting a pull request:
100
121
101
122
1.**Fork & Clone** – Fork the repository and clone it to your local machine.
102
123
2.**Create a Branch** – Use a new branch for your contribution.
103
124
3.**Sign the Contributor License Agreement (CLA)** – Before your first contribution can be accepted, you must sign our CLA via [CLA Assistant](https://cla-assistant.io). You will be prompted to sign it when submitting your first pull request. You can also review the CLA here: [https://cla.openize.com/agreement](https://cla.openize.com/agreement).
104
125
4.**Submit a Pull Request (PR)** – Once your changes are ready, open a PR with a clear description.
105
126
5.**Review & Feedback** – Our maintainers will review your PR and provide feedback if needed.
106
127
107
-
By contributing, you agree to the terms of the CLA and confirm that your changes comply with the project's licensing policies.
128
+
By contributing, you agree to the terms of the CLA and confirm that your changes comply with the project's licensing policies.
Openize.MarkItDown for Python converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with LLMs for extended processing.
7
+
Openize.MarkItDown for Python converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with popular LLMs for post-processing, including OpenAI, Claude, Gemini, and Mistral.
8
8
9
9
## Features
10
10
11
11
- Convert `.docx`, `.pdf`, `.xlsx`, and `.pptx` to Markdown.
12
-
- Save Markdown files locally or send them to an LLM for processing.
12
+
- Save Markdown files locally or send them to an LLM (OpenAI, Claude, Gemini, Mistral).
13
13
- Structured with the **Factory & Strategy Pattern** for scalability.
14
14
- Works with Windows and Linux-compatible paths.
15
15
- Command-line interface for easy use.
@@ -24,73 +24,85 @@ This package depends on the Aspose libraries, which are commercial products:
24
24
25
25
You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms.
26
26
27
-
## Installation
27
+
LLM integration may require the following additional packages or valid API credentials:
28
+
29
+
-`openai` (for OpenAI)
30
+
-`anthropic` (for Claude)
31
+
-`requests` (used for Gemini and Mistral REST APIs)
28
32
29
-
### From TestPyPI
33
+
##Installation
30
34
31
-
```sh
35
+
```bash
32
36
pip install openize-markitdown-python
33
37
```
34
38
35
39
## Usage
36
40
37
41
### Command Line Interface
38
42
39
-
```sh
43
+
```bash
40
44
# Convert a file and save locally
41
45
markitdown document.docx -o output_folder
42
46
43
-
# Process with an LLM (requires OPENAI_API_KEY environment variable)
0 commit comments