feat(engines): vLLM as 6th UI-connected engine + Gemini critical 4 fix (supersedes #296) by hang-in · Pull Request #297 · hang-in/tunaFlow

hang-in · 2026-05-28T10:54:23Z

yodakrkids 의 PR #296 (vLLM 6번째 엔진 추가) 위에 Gemini code review 의 critical 4 건 fix 를 쌓은 PR. PR #296 은 fork branch 라 직접 push 불가 + main 의 PR #295 (explicit endpoint detect trigger) 와 충돌 상태 → cherry-pick + 4 fix + 충돌 해결 후 별 PR 로 재구성.

머지 후 PR #296 close 예정 (yodakrkids 의 authorship 은 commit 9fc1a13 에 보존).

변경 요약

Base: yodakrkids 의 vLLM 엔진 추가 (commit `9fc1a13`, cherry-pick from #296)

openai_compat.rs: vllm_base_url() / discover_vllm() + engine_name 분기 + VLLM_API_KEY 처리
agent_detect.rs: probe_vllm() (OpenAI-compatible /v1/models)
agents.rs: start_openai_compat_stream 의 base_url 분기에 vllm 추가
model_discovery.rs / executor.rs: dispatch 분기에 vllm 추가
UI (MetaAgentSelector / AgentsSection / EngineSelector / RuntimeSection 등): vllm 노출 + endpoint 입력 + ko/en i18n
충돌 해결: main 의 PR fix(meta-agent): explicit endpoint detect trigger (Enter / button) #295 explicit-trigger 패턴 (Enter / refresh button) 유지 + vllm 옵션 추가

Fix 1+2 — vLLM RT routing (commit `3efcbb8`)

executor.rs:55 (run_participant non-streaming) 와 executor.rs:184 (stream_participant streaming) 의 vllm 분기가 openai_compat::run / stream_run 호출 → 두 함수 모두 내부 ollama_base_url() (OLLAMA_HOST 기본 localhost:11434) 하드코딩 → vLLM 요청이 ollama 로 잘못 라우팅되는 회귀.

수정:

run_participant: Handle::current().block_on(stream_run_with_base(input, vllm_base_url(), ...)) (spawn_blocking sync 컨텍스트)
stream_participant: engine_key 별로 base 결정 후 stream_run_with_base 직접 await

Fix 3 — vLLM eval routing (commit `4bf9d79`)

agents.rs:614 run_eval_agent 동일 회귀. Handle::try_current() + block_on(stream_run_with_base(..., vllm_base_url(), ...)) 으로 수정. openai_compat::run 의 패턴과 동일.

Fix 4 — vLLM probe Authorization header (commit `474bd5c`)

agent_detect.rs:275 probe_vllm 가 Authorization 헤더 없이 GET 호출 → --api-key 옵션으로 보호된 vLLM 인스턴스에서 401 거부로 detect 실패. VLLM_API_KEY env 있으면 Bearer <key> 헤더 추가 (openai_compat::discover_vllm 동일 패턴).

Verification

항목	결과
`cargo check`	PASS (5.33s after baseline)
`cargo test --lib`	656 passed / 0 failed
`tsc --noEmit`	clean (errors 0)
`vitest run`	478 passed / 0 failed (46 files)

회귀 가드 grep

rg -n 'openai_compat::run\b' src-tauri/src/
# → ollama 분기만 잔존 (commands/agents.rs:614 + commands/roundtable_helpers/executor.rs:54),
#   vllm 은 모두 stream_run_with_base + vllm_base_url 으로 라우팅됨

rg -n 'stream_run_with_base.*vllm_base_url|vllm_base_url' src-tauri/src/
# → 6 곳에서 일관 사용 (openai_compat 정의 + start_openai_compat_stream + executor 2건 + agents 1건)

rg -n 'VLLM_API_KEY' src-tauri/src/
# → 3 곳 — openai_compat::discover_vllm, agent_detect::probe_vllm,
#   openai_compat::stream_run_with_base (engine_name == "vLLM" 분기)

다른 엔진 파일 (claude/codex/gemini/ollama/lmstudio/claude_sdk_session) 변경 0 — git diff main...HEAD --name-only 로 확인.

Scope 메모

http_api/agents.rs:188 와 :400 의 ollama 분기는 vllm 미포함 — PR #296 의 누락 범위. 단, vllm 자체 dispatch 가 없으므로 silent 회귀 영역은 아님 (_ => claude::run fallback). 별 PR 에서 처리 권장.

CI 정책

cross-platform 영역 (Rust agent transport 변경) → CI watch 후 모두 SUCCESS 확인 후 일반 squash merge. admin merge 회피.

Refs

슈퍼시드: feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, … #296 (yodakrkids)
Gemini review: PR feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, … #296 inline comments

🤖 Generated with Claude Opus 4.7 (1M context)

…settings vLLM uses OpenAI-compatible API, so it routes through the existing openai_compat.rs path alongside ollama / lmstudio. Backend (5 files) - agent_detect.rs: probe_vllm() — GET {endpoint}/v1/models - openai_compat.rs: vllm_base_url(), discover_vllm(), engine_name + per-engine API key (VLLM_API_KEY) routing - agents.rs: start_openai_compat_stream + run_eval_agent vllm branches - executor.rs: RT run_participant / stream_participant vllm cases - model_discovery.rs: ENGINES list + dispatch + fallback_models entry Frontend (11 files) - engineConfig.ts / EngineSelector.tsx / AgentAvatar.tsx / index.css: register vllm in shared engine registry + agent color token - CreateRoundtableDialog.tsx: RT participant engine dropdown - MetaAgentSelector.tsx: onboarding meta agent — vllm endpoint state, detect_available_agents wiring, HTTP endpoint input, install hint - AgentsSection.tsx: agent profile engine dropdown + endpoint override (engineEndpoint:vllm setting) - RuntimeSection.tsx: Insight Agent engine select - types/index.ts: extend engine union comments - buildSendInput.ts: forward customBaseUrl when engine === "vllm" - initialSetupApply.ts: KNOWN_ENGINES set - locales/{en,ko}/dialog.json: meta_agent.vllm_install_hint Defaults - endpoint: http://localhost:8000 (override via UI or VLLM_ENDPOINT env) - optional VLLM_API_KEY (Bearer token if set) Verified - tsc --noEmit: clean - cargo check: clean - Vitest streaming-flow + metaAgentSelector-modelDiscovery: 29 passed - Manual: vLLM visible in meta agent selector Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…(gemini critical 1+2) executor.rs 의 `run_participant` (non-streaming) 와 `stream_participant` (streaming) 두 분기 모두 vllm 을 `openai_compat::run` / `stream_run` 으로 호출하고 있었음 → 두 함수 모두 내부에서 `ollama_base_url()` (`OLLAMA_HOST` 기본 localhost:11434) 을 하드코딩 → vLLM 요청이 ollama 서버로 라우팅되는 회귀. 수정: vllm 분기를 별도 분리해 `stream_run_with_base(input, vllm_base_url(), ...)` 패턴으로 호출. ollama 동작은 변경 없음. - run_participant: spawn_blocking sync 컨텍스트라 `Handle::current().block_on(...)` 으로 async wrapper 실행 - stream_participant: 이미 async 컨텍스트라 직접 `await` 회귀 가드: - ollama 분기는 그대로 → 기존 동작 동일 - claude / codex / gemini / opencode 분기 변경 없음 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…gemini critical 3) `run_eval_agent` 의 vllm 분기가 `openai_compat::run` 호출 → 내부 ollama 하드코딩 → vLLM 평가 요청이 localhost:11434 (ollama) 로 라우팅되는 회귀. 수정: vllm 분기를 ollama 와 분리해 `stream_run_with_base(input, vllm_base_url(), ...)` 패턴으로 호출. `Handle::try_current()` + `block_on(...)` — `openai_compat::run` 과 동일 패턴 (Tauri sync command 는 tokio runtime 안에서 실행). 회귀 가드: - ollama / codex / gemini / opencode / claude 분기 변경 없음 - runtime handle 없을 때 명확한 에러 메시지 (silent ollama 호출 차단) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…high 4) `probe_vllm` 가 `client.get(&url).send()` 만 호출 → vLLM 인스턴스가 `--api-key` 옵션으로 보호된 경우 (실제 운영 환경 권장 구성) 401 거부로 detect 실패. Authorization 헤더 누락. 수정: `VLLM_API_KEY` env 가 있고 비어있지 않으면 `Bearer <key>` 헤더 추가. `openai_compat::discover_vllm` 의 동일 패턴 따름. 헤더 없을 때 (로컬 비보호 인스턴스) 동작은 변경 없음 — 평문 GET 그대로. 회귀 가드: - env 미설정 시 기존 path 동일 - 다른 probe 함수 (ollama / lmstudio) 변경 없음 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces support for vLLM as an OpenAI-compatible engine, implementing model discovery, endpoint detection, streaming execution, and roundtable evaluation routing, alongside corresponding frontend UI updates. The code review feedback is highly constructive and identifies several robustness improvements: normalizing the vLLM base URL to prevent duplicate /v1 suffixes, filtering out empty API key environment variables to avoid sending empty Bearer tokens that trigger authorization errors, and using Handle::try_current() instead of Handle::current() to prevent potential panics in non-Tokio contexts.

gemini-code-assist · 2026-05-28T10:55:35Z

+pub fn discover_vllm() -> Option<Vec<String>> {
+    let endpoint = vllm_base_url();
+    let url = format!("{}/v1/models", endpoint.trim_end_matches('/'));


VLLM_ENDPOINT가 /v1 접미사를 포함하여 설정된 경우(예: http://localhost:8000/v1), url이 http://localhost:8000/v1/v1/models와 같이 중복되어 잘못된 경로로 생성됩니다. 이로 인해 모델 감지(discovery)가 실패할 수 있으므로, probe_vllm에서와 동일하게 /v1 접미사를 안전하게 처리하도록 URL 정규화 로직을 추가하는 것이 좋습니다.

pub fn discover_vllm() -> Option<Vec<String>> { let endpoint = vllm_base_url(); let base_raw = endpoint.trim_end_matches('/'); let base = if base_raw.ends_with("/v1") { base_raw.to_string() } else { format!("{}/v1", base_raw) }; let url = format!("{}/models", base);

gemini-code-assist · 2026-05-28T10:55:35Z

+    if let Ok(token) = std::env::var("VLLM_API_KEY") {
+        req = req.header("Authorization", format!("Bearer {}", token));
+    }


VLLM_API_KEY 환경 변수가 빈 문자열("")로 설정되어 있는 경우에도 Authorization: Bearer 헤더가 요청에 추가되어 일부 vLLM 인스턴스에서 인증 에러가 발생할 수 있습니다. probe_vllm에서와 마찬가지로 토큰이 비어있지 않은지(!token.is_empty()) 확인한 후 헤더를 추가하도록 수정하는 것이 안전합니다.

if let Ok(token) = std::env::var("VLLM_API_KEY") { if !token.is_empty() { req = req.header("Authorization", format!("Bearer {}", token)); } }

gemini-code-assist · 2026-05-28T10:55:35Z

+    let api_key = match engine_name {
+        "LM Studio" => std::env::var("LMSTUDIO_API_KEY").ok(),
+        "vLLM" => std::env::var("VLLM_API_KEY").ok(),
+        _ => None,
+    };


API 키 환경 변수가 존재하지만 빈 문자열인 경우, 요청 헤더에 빈 Bearer 토큰이 추가되어 인증이 필요 없는 로컬 인스턴스에서 401/403 에러를 유발할 수 있습니다. .filter(|t| !t.is_empty())를 사용하여 빈 문자열인 경우 None으로 처리하도록 개선하는 것이 좋습니다.

Suggested change

let api_key = match engine_name {

"LM Studio" => std::env::var("LMSTUDIO_API_KEY").ok(),

"vLLM" => std::env::var("VLLM_API_KEY").ok(),

_ => None,

};

let api_key = match engine_name {

"LM Studio" => std::env::var("LMSTUDIO_API_KEY").ok().filter(|t| !t.is_empty()),

"vLLM" => std::env::var("VLLM_API_KEY").ok().filter(|t| !t.is_empty()),

_ => None,

};

gemini-code-assist · 2026-05-28T10:55:35Z

+    let api_key = match engine_name {
+        "LM Studio" => std::env::var("LMSTUDIO_API_KEY").ok(),
+        "vLLM" => std::env::var("VLLM_API_KEY").ok(),
+        _ => None,
+    };


위와 동일하게, stream_run_no_tools_with_base 함수에서도 빈 API 키가 요청 헤더에 빈 Bearer 토큰으로 추가되는 것을 방지하기 위해 .filter(|t| !t.is_empty()) 처리를 추가하는 것이 안전합니다.

Suggested change

let api_key = match engine_name {

"LM Studio" => std::env::var("LMSTUDIO_API_KEY").ok(),

"vLLM" => std::env::var("VLLM_API_KEY").ok(),

_ => None,

};

let api_key = match engine_name {

"LM Studio" => std::env::var("LMSTUDIO_API_KEY").ok().filter(|t| !t.is_empty()),

"vLLM" => std::env::var("VLLM_API_KEY").ok().filter(|t| !t.is_empty()),

_ => None,

};

gemini-code-assist · 2026-05-28T10:55:35Z

+            "vllm" => {
+                let rt = tokio::runtime::Handle::current();
+                let res = rt.block_on(async {
+                    openai_compat::stream_run_with_base(
+                        run_input,
+                        openai_compat::vllm_base_url(),
+                        |_: String| {},
+                        |_: String| {},
+                    ).await
+                });
+                (res, "vllm")
+            }


tokio::runtime::Handle::current()는 현재 스레드가 Tokio 런타임 컨텍스트 외부에서 실행될 경우 패닉(panic)을 발생시킵니다. run_participant가 항상 Tokio 컨텍스트 내에서 호출된다고 하더라도, 테스트 코드나 다른 동기적 컨텍스트에서의 호출 가능성을 고려하여 agents.rs에서와 같이 Handle::try_current()를 사용해 안전하게 에러를 처리하는 것이 좋습니다.

"vllm" => { let res = tokio::runtime::Handle::try_current() .map_err(|_| AppError::Agent("No tokio runtime available for vllm".into())) .and_then(|rt| { rt.block_on(async { openai_compat::stream_run_with_base( run_input, openai_compat::vllm_base_url(), |_: String| {}, |_: String| {}, ).await }) }); (res, "vllm") }

…odakrkids) 매니페스트 4 곳 + Cargo.lock minor bump. CHANGELOG entry 추가. 핵심: - vLLM 6th UI-connected engine (PR #297, supersedes #296 by yodakrkids) - Gemini critical 4 fix (executor.rs / agents.rs / agent_detect.rs) - OpenAI-compatible path 재사용, ollama/lmstudio 동작 변경 0 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…handle + test mock (#299) * fix(openai-compat): vllm base url /v1 suffix normalize (PR #297 review, T1 high) `VLLM_ENDPOINT=http://host:8000/v1` 형식 (vLLM 공식 docs 안내 패턴) 시 `format!("{}/v1/models", ...)` 가 `http://host:8000/v1/v1/models` 가 되어 discover_vllm() 이 실패하던 회귀. agent_detect.rs::probe_vllm 의 동일 정책 (`/v1` 끝나면 그대로, 아니면 append) 를 `normalize_vllm_base()` helper 로 추출해 두 호출부가 같은 정책을 공유하게 한다. URL 생성은 `format!("{}/models", base)` 로 통일. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(openai-compat): filter empty api key tokens (3 sites, PR #297 review, T2+T3+T4) `VLLM_API_KEY=""` 또는 `LMSTUDIO_API_KEY=""` 빈 문자열 시 현재 그대로 헤더에 추가되어 `Authorization: Bearer ` (token 없음) 가 송신되어 일부 vLLM / LM Studio 환경에서 401 을 반환하던 회귀. 3 호출부에 동일한 `.ok().filter(|t| !t.is_empty())` 패턴을 적용: - discover_vllm — 모델 목록 GET (T2) - stream_run_with_base — chat completion POST (T3) - stream_run_no_tools_with_base — fallback POST (T4) 빈 토큰은 헤더에서 완전히 누락 (로컬 비보호 인스턴스 호환). 보호된 인스턴스용 토큰이 있으면 그대로 송신. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(executor): Handle::try_current for vllm participant (PR #297 review, T5) `tokio::runtime::Handle::current()` 는 non-Tokio context 시 panic. spawn_blocking 안에서 호출되므로 일반적으로는 runtime handle 이 존재하지만, 호스트 환경 (예: test 또는 직접 호출) 에서 graceful fallback 이 필요. `try_current()` 후 `Err(_)` 시 명시적 AppError 로 변환 — commands/agents.rs:621 의 동일 패턴과 일관. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(plan-file-path): mock helpers to prevent fire-and-forget flakiness (PR #298 review, T6) `approveAndStartImplementation` 내부 `createArchitectDecisionArtifact(plan)` 는 fire-and-forget 으로 호출되어 test 안에서 비결정적 mock 호출 순서를 만들 수 있음. `vi.mock("./helpers", async (importOriginal) => ...)` 패턴으로 `createArchitectDecisionArtifact` 만 `vi.fn(async () => undefined)` 로 대체. 나머지 helper (getPlanSlug / slugifyPlanTitle / buildPlanContext / createAndLinkBranch …) 는 actual 구현 그대로 사용해 cross-generator 일관성 검증 의미를 보존. 13 test 모두 통과 (Test Files 1 passed, Tests 13 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: dghong <d9ng@outlook.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yodakrkids and others added 4 commits May 28, 2026 19:40

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

hang-in merged commit cf8d8a5 into main May 28, 2026
3 checks passed

hang-in mentioned this pull request May 28, 2026

feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, … #296

Closed

10 tasks

hang-in mentioned this pull request May 28, 2026

fix: gemini review final batch — vllm base url + empty token + tokio handle + test mock #299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(engines): vLLM as 6th UI-connected engine + Gemini critical 4 fix (supersedes #296)#297

feat(engines): vLLM as 6th UI-connected engine + Gemini critical 4 fix (supersedes #296)#297
hang-in merged 4 commits into
mainfrom
feat/vllm-engine-with-critical-fixes

hang-in commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hang-in commented May 28, 2026

변경 요약

Base: yodakrkids 의 vLLM 엔진 추가 (commit 9fc1a13, cherry-pick from #296)

Fix 1+2 — vLLM RT routing (commit 3efcbb8)

Fix 3 — vLLM eval routing (commit 4bf9d79)

Fix 4 — vLLM probe Authorization header (commit 474bd5c)

Verification

회귀 가드 grep

Scope 메모

CI 정책

Refs

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Base: yodakrkids 의 vLLM 엔진 추가 (commit `9fc1a13`, cherry-pick from #296)

Fix 1+2 — vLLM RT routing (commit `3efcbb8`)

Fix 3 — vLLM eval routing (commit `4bf9d79`)

Fix 4 — vLLM probe Authorization header (commit `474bd5c`)