feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, … by yodakrkids · Pull Request #296 · hang-in/tunaFlow

yodakrkids · 2026-05-28T10:01:53Z

…settings

vLLM uses OpenAI-compatible API, so it routes through the existing openai_compat.rs path alongside ollama / lmstudio.

Backend (5 files)

agent_detect.rs: probe_vllm() — GET {endpoint}/v1/models
openai_compat.rs: vllm_base_url(), discover_vllm(), engine_name + per-engine API key (VLLM_API_KEY) routing
agents.rs: start_openai_compat_stream + run_eval_agent vllm branches
executor.rs: RT run_participant / stream_participant vllm cases
model_discovery.rs: ENGINES list + dispatch + fallback_models entry

Frontend (11 files)

engineConfig.ts / EngineSelector.tsx / AgentAvatar.tsx / index.css: register vllm in shared engine registry + agent color token
CreateRoundtableDialog.tsx: RT participant engine dropdown
MetaAgentSelector.tsx: onboarding meta agent — vllm endpoint state, detect_available_agents wiring, HTTP endpoint input, install hint
AgentsSection.tsx: agent profile engine dropdown + endpoint override (engineEndpoint:vllm setting)
RuntimeSection.tsx: Insight Agent engine select
types/index.ts: extend engine union comments
buildSendInput.ts: forward customBaseUrl when engine === "vllm"
initialSetupApply.ts: KNOWN_ENGINES set
locales/{en,ko}/dialog.json: meta_agent.vllm_install_hint

Defaults

endpoint: http://localhost:8000 (override via UI or VLLM_ENDPOINT env)
optional VLLM_API_KEY (Bearer token if set)

Verified

tsc --noEmit: clean
cargo check: clean
Vitest streaming-flow + metaAgentSelector-modelDiscovery: 29 passed
Manual: vLLM visible in meta agent selector

Summary

Related plan / issue

Changes

Test plan

Invariants touched

Screenshots / logs

Checklist

PR title follows Conventional Commits (feat(scope): ...)
Tests added or updated
Docs updated (plan, README, how-to) if behavior changed
No secrets, tokens, or personal paths in the diff

…settings vLLM uses OpenAI-compatible API, so it routes through the existing openai_compat.rs path alongside ollama / lmstudio. Backend (5 files) - agent_detect.rs: probe_vllm() — GET {endpoint}/v1/models - openai_compat.rs: vllm_base_url(), discover_vllm(), engine_name + per-engine API key (VLLM_API_KEY) routing - agents.rs: start_openai_compat_stream + run_eval_agent vllm branches - executor.rs: RT run_participant / stream_participant vllm cases - model_discovery.rs: ENGINES list + dispatch + fallback_models entry Frontend (11 files) - engineConfig.ts / EngineSelector.tsx / AgentAvatar.tsx / index.css: register vllm in shared engine registry + agent color token - CreateRoundtableDialog.tsx: RT participant engine dropdown - MetaAgentSelector.tsx: onboarding meta agent — vllm endpoint state, detect_available_agents wiring, HTTP endpoint input, install hint - AgentsSection.tsx: agent profile engine dropdown + endpoint override (engineEndpoint:vllm setting) - RuntimeSection.tsx: Insight Agent engine select - types/index.ts: extend engine union comments - buildSendInput.ts: forward customBaseUrl when engine === "vllm" - initialSetupApply.ts: KNOWN_ENGINES set - locales/{en,ko}/dialog.json: meta_agent.vllm_install_hint Defaults - endpoint: http://localhost:8000 (override via UI or VLLM_ENDPOINT env) - optional VLLM_API_KEY (Bearer token if set) Verified - tsc --noEmit: clean - cargo check: clean - Vitest streaming-flow + metaAgentSelector-modelDiscovery: 29 passed - Manual: vLLM visible in meta agent selector Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request adds support for vLLM as an OpenAI-compatible engine, integrating it across backend detection, streaming, roundtable execution, and frontend settings. The review comments point out critical routing issues where vLLM non-streaming, streaming, and evaluation requests are incorrectly routed to the Ollama endpoint because they call functions hardcoded to Ollama's base URL. Additionally, a high-severity issue was identified in the vLLM detection probe, which fails to include the VLLM_API_KEY authorization header, potentially causing onboarding failures on authenticated vLLM instances.

gemini-code-assist · 2026-05-28T10:03:33Z

            "gemini" => (gemini::run(run_input), "gemini"),
            "opencode" => (opencode::run(run_input), "opencode"),
            "ollama" => (openai_compat::run(run_input), "ollama"),
+            "vllm" => (openai_compat::run(run_input), "vllm"),


The fallback non-streaming run_participant execution for vllm currently routes through openai_compat::run(run_input), which is hardcoded to use ollama_base_url() (typically http://localhost:11434). This causes vLLM roundtable requests to be incorrectly routed to the Ollama endpoint. We should instead use openai_compat::stream_run_with_base with the correct vllm_base_url() by blocking on the current Tokio runtime handle.

"vllm" => { let base_url = openai_compat::vllm_base_url(); let res = tokio::runtime::Handle::try_current() .map_err(|_| AppError::Agent("No tokio runtime available for vllm".into())) .and_then(|rt| { rt.block_on(async { openai_compat::stream_run_with_base(run_input, base_url, |_| {}, |_| {}).await }) }); (res, "vllm") }

gemini-code-assist · 2026-05-28T10:03:33Z

+            let label = if engine_key_owned == "vllm" { "vllm" } else { "ollama" };
+            (openai_compat::stream_run(run_input, on_progress, on_chunk).await, label)


In stream_participant, the streaming execution for vllm currently calls openai_compat::stream_run(run_input, on_progress, on_chunk), which is hardcoded to use ollama_base_url(). This causes vLLM streaming roundtable requests to be incorrectly routed to the Ollama endpoint. We should instead use openai_compat::stream_run_with_base with the correct base URL for the selected engine.

Suggested change

let label = if engine_key_owned == "vllm" { "vllm" } else { "ollama" };

(openai_compat::stream_run(run_input, on_progress, on_chunk).await, label)

let base_url = if engine_key_owned == "vllm" {

openai_compat::vllm_base_url()

} else {

std::env::var("OLLAMA_HOST").unwrap_or_else(|_| "http://localhost:11434".into())

};

let label = if engine_key_owned == "vllm" { "vllm" } else { "ollama" };

(openai_compat::stream_run_with_base(run_input, base_url, on_progress, on_chunk).await, label)

gemini-code-assist · 2026-05-28T10:03:33Z

        "gemini" => gemini::run(run_input),
        "opencode" => opencode::run(run_input),
-        "ollama" => openai_compat::run(run_input),
+        "ollama" | "vllm" => openai_compat::run(run_input),


In run_eval_agent, the evaluation path for vllm currently routes through openai_compat::run(run_input), which is hardcoded to use ollama_base_url(). This causes vLLM evaluation requests to be incorrectly routed to the Ollama endpoint. We should instead use openai_compat::stream_run_with_base with the correct vllm_base_url() by blocking on the current Tokio runtime handle.

"ollama" => openai_compat::run(run_input), "vllm" => { let base_url = openai_compat::vllm_base_url(); let rt = tokio::runtime::Handle::try_current() .map_err(|_| AppError::Agent("No tokio runtime available for vllm".into()))?; rt.block_on(async { openai_compat::stream_run_with_base(run_input, base_url, |_| {}, |_| {}).await }) }

gemini-code-assist · 2026-05-28T10:03:33Z

+    eprintln!("[agent-detect] probe vllm: GET {}", url);
+    match client.get(&url).send().await {


The probe_vllm function does not include the Authorization header with VLLM_API_KEY when sending the probe request. If the vLLM instance requires authentication (which is common for shared or cloud-hosted instances), the detection probe will fail with a 401 Unauthorized status, even if the key is configured in the environment. Adding the Authorization header ensures that authenticated vLLM instances are correctly detected during onboarding.

eprintln!("[agent-detect] probe vllm: GET {}", url); let mut req = client.get(&url); if let Ok(token) = std::env::var("VLLM_API_KEY") { req = req.header("Authorization", format!("Bearer {}", token)); } match req.send().await {

hang-in · 2026-05-28T10:33:34Z

@yodakrkids vLLM 6번째 엔진 추가 PR 감사드립니다. tunaFlow 의 openai_compat 라우팅 (ollama / lmstudio 와 같은 layer) 을 그대로 재사용하신 패턴이 합리적입니다. 코드 평가 + CI 통과 + Gemini review 확인 후 머지 진행하겠습니다 — 다음 patch release (v0.1.8-beta-5) 또는 minor release (v0.1.9-beta) 에 포함 예정. CI 가 fork PR 정책으로 메인테이너 approval 대기 중이라 곧 trigger 하겠습니다.

…x (supersedes #296) (#297) * feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, settings vLLM uses OpenAI-compatible API, so it routes through the existing openai_compat.rs path alongside ollama / lmstudio. Backend (5 files) - agent_detect.rs: probe_vllm() — GET {endpoint}/v1/models - openai_compat.rs: vllm_base_url(), discover_vllm(), engine_name + per-engine API key (VLLM_API_KEY) routing - agents.rs: start_openai_compat_stream + run_eval_agent vllm branches - executor.rs: RT run_participant / stream_participant vllm cases - model_discovery.rs: ENGINES list + dispatch + fallback_models entry Frontend (11 files) - engineConfig.ts / EngineSelector.tsx / AgentAvatar.tsx / index.css: register vllm in shared engine registry + agent color token - CreateRoundtableDialog.tsx: RT participant engine dropdown - MetaAgentSelector.tsx: onboarding meta agent — vllm endpoint state, detect_available_agents wiring, HTTP endpoint input, install hint - AgentsSection.tsx: agent profile engine dropdown + endpoint override (engineEndpoint:vllm setting) - RuntimeSection.tsx: Insight Agent engine select - types/index.ts: extend engine union comments - buildSendInput.ts: forward customBaseUrl when engine === "vllm" - initialSetupApply.ts: KNOWN_ENGINES set - locales/{en,ko}/dialog.json: meta_agent.vllm_install_hint Defaults - endpoint: http://localhost:8000 (override via UI or VLLM_ENDPOINT env) - optional VLLM_API_KEY (Bearer token if set) Verified - tsc --noEmit: clean - cargo check: clean - Vitest streaming-flow + metaAgentSelector-modelDiscovery: 29 passed - Manual: vLLM visible in meta agent selector Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(engines): vLLM RT routing — stream_run_with_base + vllm_base_url (gemini critical 1+2) executor.rs 의 `run_participant` (non-streaming) 와 `stream_participant` (streaming) 두 분기 모두 vllm 을 `openai_compat::run` / `stream_run` 으로 호출하고 있었음 → 두 함수 모두 내부에서 `ollama_base_url()` (`OLLAMA_HOST` 기본 localhost:11434) 을 하드코딩 → vLLM 요청이 ollama 서버로 라우팅되는 회귀. 수정: vllm 분기를 별도 분리해 `stream_run_with_base(input, vllm_base_url(), ...)` 패턴으로 호출. ollama 동작은 변경 없음. - run_participant: spawn_blocking sync 컨텍스트라 `Handle::current().block_on(...)` 으로 async wrapper 실행 - stream_participant: 이미 async 컨텍스트라 직접 `await` 회귀 가드: - ollama 분기는 그대로 → 기존 동작 동일 - claude / codex / gemini / opencode 분기 변경 없음 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(engines): vLLM eval routing — run_eval_agent uses vllm_base_url (gemini critical 3) `run_eval_agent` 의 vllm 분기가 `openai_compat::run` 호출 → 내부 ollama 하드코딩 → vLLM 평가 요청이 localhost:11434 (ollama) 로 라우팅되는 회귀. 수정: vllm 분기를 ollama 와 분리해 `stream_run_with_base(input, vllm_base_url(), ...)` 패턴으로 호출. `Handle::try_current()` + `block_on(...)` — `openai_compat::run` 과 동일 패턴 (Tauri sync command 는 tokio runtime 안에서 실행). 회귀 가드: - ollama / codex / gemini / opencode / claude 분기 변경 없음 - runtime handle 없을 때 명확한 에러 메시지 (silent ollama 호출 차단) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(engines): vLLM probe Authorization header — VLLM_API_KEY (gemini high 4) `probe_vllm` 가 `client.get(&url).send()` 만 호출 → vLLM 인스턴스가 `--api-key` 옵션으로 보호된 경우 (실제 운영 환경 권장 구성) 401 거부로 detect 실패. Authorization 헤더 누락. 수정: `VLLM_API_KEY` env 가 있고 비어있지 않으면 `Bearer <key>` 헤더 추가. `openai_compat::discover_vllm` 의 동일 패턴 따름. 헤더 없을 때 (로컬 비보호 인스턴스) 동작은 변경 없음 — 평문 GET 그대로. 회귀 가드: - env 미설정 시 기존 path 동일 - 다른 probe 함수 (ollama / lmstudio) 변경 없음 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: 임용식 <yoda@krkids.co.kr> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: dghong <d9ng@outlook.com>

hang-in · 2026-05-28T11:08:04Z

@yodakrkids 안녕하세요. 정말 감사드립니다 — vLLM 6번째 엔진 추가 PR 받았습니다.

본 PR 의 변경분을 그대로 main 으로 가져가면서, Gemini code review 가 지적한 4 건 critical fix 를 함께 적용한 후속 PR #297 을 만들어 머지했습니다:

머지된 PR: #297 (squash commit cf8d8a5)

yodakrkids 의 vLLM 추가 commit 은 9fc1a13 으로 cherry-pick — author 보존됨
그 위에 Gemini critical 4 건 fix commit 3 개를 쌓음

적용된 4 fix

executor.rs:55 — run_participant (non-streaming) vllm 분기가 openai_compat::run 호출 → 내부 ollama_base_url() 하드코딩 회귀. stream_run_with_base(input, vllm_base_url(), ...) + Handle::current().block_on(...) 으로 수정.
executor.rs:184 — stream_participant (streaming) 동일 회귀. ollama / vllm 분기 분리 후 stream_run_with_base 직접 await.
agents.rs:614 — run_eval_agent 동일 회귀. Handle::try_current() + block_on(stream_run_with_base(..., vllm_base_url(), ...)).
agent_detect.rs:275 — probe_vllm 의 Authorization 헤더 누락. VLLM_API_KEY env 있으면 Bearer <key> 헤더 추가.

해결된 충돌
main 의 PR #295 (explicit endpoint detect trigger — Enter / refresh button) 와의 MetaAgentSelector 충돌은 main 의 explicit-trigger 패턴 유지 + vllm 옵션 추가 형태로 해결했습니다. 디바운스 기반 자동 detect 는 외부 사용자 보고 (192.168.1.1 입력 중 . 칠 때마다 발동) 회피를 위해 제거되었습니다.

검증

cargo check PASS
cargo test --lib — 656 passed / 0 failed
tsc --noEmit clean
vitest run — 478 passed / 0 failed
CI 3 종 (rust-check / frontend-check / eval) 모두 SUCCESS 후 머지

곧 v0.1.9-beta minor release publish 후 release URL + 회복 안내 댓글 별도로 드리겠습니다. 정말 좋은 기여 감사합니다.

…odakrkids) 매니페스트 4 곳 + Cargo.lock minor bump. CHANGELOG entry 추가. 핵심: - vLLM 6th UI-connected engine (PR #297, supersedes #296 by yodakrkids) - Gemini critical 4 fix (executor.rs / agents.rs / agent_detect.rs) - OpenAI-compatible path 재사용, ollama/lmstudio 동작 변경 0 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hang-in · 2026-05-28T11:08:57Z

@yodakrkids vLLM PR 감사드립니다. Gemini code review 가 critical 4 항목 (executor.rs / agents.rs vLLM 분기가 ollama base URL 로 잘못 라우팅 + probe_vllm Authorization 누락) 지적해 주셨고, 본 PR 의 commit 들 위에 메인테이너가 4 fix follow-up commit 추가하여 PR #297 로 supersede 머지했습니다.

머지 commit: cf8d8a5 (PR #297)
Release: https://github.com/hang-in/tunaFlow/releases/tag/v0.1.9-beta (publish 후 URL)

본 PR 은 PR #297 로 supersede 됐으니 close 합니다. vLLM 외 다른 영역 PR 도 환영합니다 — 본 PR 의 6th engine 추가 작업 덕분에 minor bump (v0.1.8 → v0.1.9) 가 가능했습니다. 감사합니다.

hang-in · 2026-05-28T13:40:01Z

v0.1.9-beta publish 완료 — https://github.com/hang-in/tunaFlow/releases/tag/v0.1.9-beta

@yodakrkids 의 vLLM 6th engine 추가 (merge cf8d8a5 via PR #297) 가 첫 외부 contributor feature release 입니다. macOS DMG / Windows installer 자산 모두 빌드 완료. 다시 한 번 감사드립니다.

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

hang-in mentioned this pull request May 28, 2026

feat(engines): vLLM as 6th UI-connected engine + Gemini critical 4 fix (supersedes #296) #297

Merged

hang-in closed this May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, …#296

feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, …#296
yodakrkids wants to merge 1 commit into
hang-in:mainfrom
yodakrkids:feat/vllm-engine

yodakrkids commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

hang-in commented May 28, 2026

Uh oh!

hang-in commented May 28, 2026

Uh oh!

hang-in commented May 28, 2026

Uh oh!

hang-in commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		let label = if engine_key_owned == "vllm" { "vllm" } else { "ollama" };
		(openai_compat::stream_run(run_input, on_progress, on_chunk).await, label)

		eprintln!("[agent-detect] probe vllm: GET {}", url);
		match client.get(&url).send().await {

Conversation

yodakrkids commented May 28, 2026

Summary

Related plan / issue

Changes

Test plan

Invariants touched

Screenshots / logs

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

hang-in commented May 28, 2026

Uh oh!

hang-in commented May 28, 2026

Uh oh!

hang-in commented May 28, 2026

Uh oh!

hang-in commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants