feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, …#296
feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, …#296yodakrkids wants to merge 1 commit into
Conversation
…settings
vLLM uses OpenAI-compatible API, so it routes through the existing
openai_compat.rs path alongside ollama / lmstudio.
Backend (5 files)
- agent_detect.rs: probe_vllm() — GET {endpoint}/v1/models
- openai_compat.rs: vllm_base_url(), discover_vllm(), engine_name +
per-engine API key (VLLM_API_KEY) routing
- agents.rs: start_openai_compat_stream + run_eval_agent vllm branches
- executor.rs: RT run_participant / stream_participant vllm cases
- model_discovery.rs: ENGINES list + dispatch + fallback_models entry
Frontend (11 files)
- engineConfig.ts / EngineSelector.tsx / AgentAvatar.tsx / index.css:
register vllm in shared engine registry + agent color token
- CreateRoundtableDialog.tsx: RT participant engine dropdown
- MetaAgentSelector.tsx: onboarding meta agent — vllm endpoint state,
detect_available_agents wiring, HTTP endpoint input, install hint
- AgentsSection.tsx: agent profile engine dropdown + endpoint override
(engineEndpoint:vllm setting)
- RuntimeSection.tsx: Insight Agent engine select
- types/index.ts: extend engine union comments
- buildSendInput.ts: forward customBaseUrl when engine === "vllm"
- initialSetupApply.ts: KNOWN_ENGINES set
- locales/{en,ko}/dialog.json: meta_agent.vllm_install_hint
Defaults
- endpoint: http://localhost:8000 (override via UI or VLLM_ENDPOINT env)
- optional VLLM_API_KEY (Bearer token if set)
Verified
- tsc --noEmit: clean
- cargo check: clean
- Vitest streaming-flow + metaAgentSelector-modelDiscovery: 29 passed
- Manual: vLLM visible in meta agent selector
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request adds support for vLLM as an OpenAI-compatible engine, integrating it across backend detection, streaming, roundtable execution, and frontend settings. The review comments point out critical routing issues where vLLM non-streaming, streaming, and evaluation requests are incorrectly routed to the Ollama endpoint because they call functions hardcoded to Ollama's base URL. Additionally, a high-severity issue was identified in the vLLM detection probe, which fails to include the VLLM_API_KEY authorization header, potentially causing onboarding failures on authenticated vLLM instances.
| "gemini" => (gemini::run(run_input), "gemini"), | ||
| "opencode" => (opencode::run(run_input), "opencode"), | ||
| "ollama" => (openai_compat::run(run_input), "ollama"), | ||
| "vllm" => (openai_compat::run(run_input), "vllm"), |
There was a problem hiding this comment.
The fallback non-streaming run_participant execution for vllm currently routes through openai_compat::run(run_input), which is hardcoded to use ollama_base_url() (typically http://localhost:11434). This causes vLLM roundtable requests to be incorrectly routed to the Ollama endpoint. We should instead use openai_compat::stream_run_with_base with the correct vllm_base_url() by blocking on the current Tokio runtime handle.
"vllm" => {
let base_url = openai_compat::vllm_base_url();
let res = tokio::runtime::Handle::try_current()
.map_err(|_| AppError::Agent("No tokio runtime available for vllm".into()))
.and_then(|rt| {
rt.block_on(async {
openai_compat::stream_run_with_base(run_input, base_url, |_| {}, |_| {}).await
})
});
(res, "vllm")
}| let label = if engine_key_owned == "vllm" { "vllm" } else { "ollama" }; | ||
| (openai_compat::stream_run(run_input, on_progress, on_chunk).await, label) |
There was a problem hiding this comment.
In stream_participant, the streaming execution for vllm currently calls openai_compat::stream_run(run_input, on_progress, on_chunk), which is hardcoded to use ollama_base_url(). This causes vLLM streaming roundtable requests to be incorrectly routed to the Ollama endpoint. We should instead use openai_compat::stream_run_with_base with the correct base URL for the selected engine.
| let label = if engine_key_owned == "vllm" { "vllm" } else { "ollama" }; | |
| (openai_compat::stream_run(run_input, on_progress, on_chunk).await, label) | |
| let base_url = if engine_key_owned == "vllm" { | |
| openai_compat::vllm_base_url() | |
| } else { | |
| std::env::var("OLLAMA_HOST").unwrap_or_else(|_| "http://localhost:11434".into()) | |
| }; | |
| let label = if engine_key_owned == "vllm" { "vllm" } else { "ollama" }; | |
| (openai_compat::stream_run_with_base(run_input, base_url, on_progress, on_chunk).await, label) |
| "gemini" => gemini::run(run_input), | ||
| "opencode" => opencode::run(run_input), | ||
| "ollama" => openai_compat::run(run_input), | ||
| "ollama" | "vllm" => openai_compat::run(run_input), |
There was a problem hiding this comment.
In run_eval_agent, the evaluation path for vllm currently routes through openai_compat::run(run_input), which is hardcoded to use ollama_base_url(). This causes vLLM evaluation requests to be incorrectly routed to the Ollama endpoint. We should instead use openai_compat::stream_run_with_base with the correct vllm_base_url() by blocking on the current Tokio runtime handle.
"ollama" => openai_compat::run(run_input),
"vllm" => {
let base_url = openai_compat::vllm_base_url();
let rt = tokio::runtime::Handle::try_current()
.map_err(|_| AppError::Agent("No tokio runtime available for vllm".into()))?;
rt.block_on(async {
openai_compat::stream_run_with_base(run_input, base_url, |_| {}, |_| {}).await
})
}| eprintln!("[agent-detect] probe vllm: GET {}", url); | ||
| match client.get(&url).send().await { |
There was a problem hiding this comment.
The probe_vllm function does not include the Authorization header with VLLM_API_KEY when sending the probe request. If the vLLM instance requires authentication (which is common for shared or cloud-hosted instances), the detection probe will fail with a 401 Unauthorized status, even if the key is configured in the environment. Adding the Authorization header ensures that authenticated vLLM instances are correctly detected during onboarding.
eprintln!("[agent-detect] probe vllm: GET {}", url);
let mut req = client.get(&url);
if let Ok(token) = std::env::var("VLLM_API_KEY") {
req = req.header("Authorization", format!("Bearer {}", token));
}
match req.send().await {|
@yodakrkids vLLM 6번째 엔진 추가 PR 감사드립니다. tunaFlow 의 openai_compat 라우팅 (ollama / lmstudio 와 같은 layer) 을 그대로 재사용하신 패턴이 합리적입니다. 코드 평가 + CI 통과 + Gemini review 확인 후 머지 진행하겠습니다 — 다음 patch release (v0.1.8-beta-5) 또는 minor release (v0.1.9-beta) 에 포함 예정. CI 가 fork PR 정책으로 메인테이너 approval 대기 중이라 곧 trigger 하겠습니다. |
…x (supersedes #296) (#297) * feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, settings vLLM uses OpenAI-compatible API, so it routes through the existing openai_compat.rs path alongside ollama / lmstudio. Backend (5 files) - agent_detect.rs: probe_vllm() — GET {endpoint}/v1/models - openai_compat.rs: vllm_base_url(), discover_vllm(), engine_name + per-engine API key (VLLM_API_KEY) routing - agents.rs: start_openai_compat_stream + run_eval_agent vllm branches - executor.rs: RT run_participant / stream_participant vllm cases - model_discovery.rs: ENGINES list + dispatch + fallback_models entry Frontend (11 files) - engineConfig.ts / EngineSelector.tsx / AgentAvatar.tsx / index.css: register vllm in shared engine registry + agent color token - CreateRoundtableDialog.tsx: RT participant engine dropdown - MetaAgentSelector.tsx: onboarding meta agent — vllm endpoint state, detect_available_agents wiring, HTTP endpoint input, install hint - AgentsSection.tsx: agent profile engine dropdown + endpoint override (engineEndpoint:vllm setting) - RuntimeSection.tsx: Insight Agent engine select - types/index.ts: extend engine union comments - buildSendInput.ts: forward customBaseUrl when engine === "vllm" - initialSetupApply.ts: KNOWN_ENGINES set - locales/{en,ko}/dialog.json: meta_agent.vllm_install_hint Defaults - endpoint: http://localhost:8000 (override via UI or VLLM_ENDPOINT env) - optional VLLM_API_KEY (Bearer token if set) Verified - tsc --noEmit: clean - cargo check: clean - Vitest streaming-flow + metaAgentSelector-modelDiscovery: 29 passed - Manual: vLLM visible in meta agent selector Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(engines): vLLM RT routing — stream_run_with_base + vllm_base_url (gemini critical 1+2) executor.rs 의 `run_participant` (non-streaming) 와 `stream_participant` (streaming) 두 분기 모두 vllm 을 `openai_compat::run` / `stream_run` 으로 호출하고 있었음 → 두 함수 모두 내부에서 `ollama_base_url()` (`OLLAMA_HOST` 기본 localhost:11434) 을 하드코딩 → vLLM 요청이 ollama 서버로 라우팅되는 회귀. 수정: vllm 분기를 별도 분리해 `stream_run_with_base(input, vllm_base_url(), ...)` 패턴으로 호출. ollama 동작은 변경 없음. - run_participant: spawn_blocking sync 컨텍스트라 `Handle::current().block_on(...)` 으로 async wrapper 실행 - stream_participant: 이미 async 컨텍스트라 직접 `await` 회귀 가드: - ollama 분기는 그대로 → 기존 동작 동일 - claude / codex / gemini / opencode 분기 변경 없음 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(engines): vLLM eval routing — run_eval_agent uses vllm_base_url (gemini critical 3) `run_eval_agent` 의 vllm 분기가 `openai_compat::run` 호출 → 내부 ollama 하드코딩 → vLLM 평가 요청이 localhost:11434 (ollama) 로 라우팅되는 회귀. 수정: vllm 분기를 ollama 와 분리해 `stream_run_with_base(input, vllm_base_url(), ...)` 패턴으로 호출. `Handle::try_current()` + `block_on(...)` — `openai_compat::run` 과 동일 패턴 (Tauri sync command 는 tokio runtime 안에서 실행). 회귀 가드: - ollama / codex / gemini / opencode / claude 분기 변경 없음 - runtime handle 없을 때 명확한 에러 메시지 (silent ollama 호출 차단) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(engines): vLLM probe Authorization header — VLLM_API_KEY (gemini high 4) `probe_vllm` 가 `client.get(&url).send()` 만 호출 → vLLM 인스턴스가 `--api-key` 옵션으로 보호된 경우 (실제 운영 환경 권장 구성) 401 거부로 detect 실패. Authorization 헤더 누락. 수정: `VLLM_API_KEY` env 가 있고 비어있지 않으면 `Bearer <key>` 헤더 추가. `openai_compat::discover_vllm` 의 동일 패턴 따름. 헤더 없을 때 (로컬 비보호 인스턴스) 동작은 변경 없음 — 평문 GET 그대로. 회귀 가드: - env 미설정 시 기존 path 동일 - 다른 probe 함수 (ollama / lmstudio) 변경 없음 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: 임용식 <yoda@krkids.co.kr> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: dghong <d9ng@outlook.com>
|
@yodakrkids 안녕하세요. 정말 감사드립니다 — vLLM 6번째 엔진 추가 PR 받았습니다. 본 PR 의 변경분을 그대로 main 으로 가져가면서, Gemini code review 가 지적한 4 건 critical fix 를 함께 적용한 후속 PR #297 을 만들어 머지했습니다: 머지된 PR: #297 (squash commit
적용된 4 fix
해결된 충돌 검증
곧 v0.1.9-beta minor release publish 후 release URL + 회복 안내 댓글 별도로 드리겠습니다. 정말 좋은 기여 감사합니다. |
…odakrkids) 매니페스트 4 곳 + Cargo.lock minor bump. CHANGELOG entry 추가. 핵심: - vLLM 6th UI-connected engine (PR #297, supersedes #296 by yodakrkids) - Gemini critical 4 fix (executor.rs / agents.rs / agent_detect.rs) - OpenAI-compatible path 재사용, ollama/lmstudio 동작 변경 0 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@yodakrkids vLLM PR 감사드립니다. Gemini code review 가 critical 4 항목 (executor.rs / agents.rs vLLM 분기가 ollama base URL 로 잘못 라우팅 + probe_vllm Authorization 누락) 지적해 주셨고, 본 PR 의 commit 들 위에 메인테이너가 4 fix follow-up commit 추가하여 PR #297 로 supersede 머지했습니다. 머지 commit: 본 PR 은 PR #297 로 supersede 됐으니 close 합니다. vLLM 외 다른 영역 PR 도 환영합니다 — 본 PR 의 6th engine 추가 작업 덕분에 minor bump (v0.1.8 → v0.1.9) 가 가능했습니다. 감사합니다. |
|
v0.1.9-beta publish 완료 — https://github.com/hang-in/tunaFlow/releases/tag/v0.1.9-beta @yodakrkids 의 vLLM 6th engine 추가 (merge |
…settings
vLLM uses OpenAI-compatible API, so it routes through the existing openai_compat.rs path alongside ollama / lmstudio.
Backend (5 files)
Frontend (11 files)
Defaults
Verified
Summary
Related plan / issue
Changes
Test plan
npx tsc --noEmitnpx vite buildcd src-tauri && cargo checknpx vitest runcd src-tauri && cargo test --libInvariants touched
Screenshots / logs
Checklist
feat(scope): ...)