Feature request: OpenAI-compatible and Gemini API embedding support

## Problem

Local GGUF embedding works well for privacy/offline use, but has friction:
- Requires downloading a ~300MB+ model on first run
- Slow on CPU; needs Metal/CUDA for reasonable throughput
- Limited to models that fit in local memory

There's no way to use cloud embedding APIs (OpenAI, Gemini, Ollama remote, etc.) without forking.

## Proposed solution

Add three environment variables to activate API-based embedding as an alternative to local GGUF:

```bash
QMD_EMBED_API_URL=   # base URL — presence activates API mode
QMD_EMBED_API_KEY=   # API key
QMD_EMBED_API_MODEL= # model name
```

API type auto-detected from URL — no extra config:
- `googleapis.com` → Gemini (`batchEmbedContents`)
- anything else → OpenAI-compatible (OpenAI, Ollama, LM Studio, etc.)

Local mode continues to work exactly as before when `QMD_EMBED_API_URL` is unset.

## Benchmark (63 Korean+English chunks, M3 Max)

| Model | Per chunk | Dims | Cost |
|-------|:---------:|-----:|------|
| embeddinggemma-300M (local/Metal) | 72ms | 768 | $0 |
| text-embedding-3-small | 16ms | 1536 | $0.020/1M tokens |
| text-embedding-3-large | 13ms | 3072 | $0.130/1M tokens |
| gemini-embedding-001 | 38ms | 3072 | free tier |

## Implementation

PR #427 has a working implementation with benchmark script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: OpenAI-compatible and Gemini API embedding support #428

Problem

Proposed solution

Benchmark (63 Korean+English chunks, M3 Max)

Implementation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Model	Per chunk	Dims	Cost
embeddinggemma-300M (local/Metal)	72ms	768	$0
text-embedding-3-small	16ms	1536	$0.020/1M tokens
text-embedding-3-large	13ms	3072	$0.130/1M tokens
gemini-embedding-001	38ms	3072	free tier

Feature request: OpenAI-compatible and Gemini API embedding support #428

Description

Problem

Proposed solution

Benchmark (63 Korean+English chunks, M3 Max)

Implementation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions