Skip to content

feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, …#296

Closed
yodakrkids wants to merge 1 commit into
hang-in:mainfrom
yodakrkids:feat/vllm-engine
Closed

feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, …#296
yodakrkids wants to merge 1 commit into
hang-in:mainfrom
yodakrkids:feat/vllm-engine

Conversation

@yodakrkids
Copy link
Copy Markdown
Contributor

…settings

vLLM uses OpenAI-compatible API, so it routes through the existing openai_compat.rs path alongside ollama / lmstudio.

Backend (5 files)

  • agent_detect.rs: probe_vllm() — GET {endpoint}/v1/models
  • openai_compat.rs: vllm_base_url(), discover_vllm(), engine_name + per-engine API key (VLLM_API_KEY) routing
  • agents.rs: start_openai_compat_stream + run_eval_agent vllm branches
  • executor.rs: RT run_participant / stream_participant vllm cases
  • model_discovery.rs: ENGINES list + dispatch + fallback_models entry

Frontend (11 files)

  • engineConfig.ts / EngineSelector.tsx / AgentAvatar.tsx / index.css: register vllm in shared engine registry + agent color token
  • CreateRoundtableDialog.tsx: RT participant engine dropdown
  • MetaAgentSelector.tsx: onboarding meta agent — vllm endpoint state, detect_available_agents wiring, HTTP endpoint input, install hint
  • AgentsSection.tsx: agent profile engine dropdown + endpoint override (engineEndpoint:vllm setting)
  • RuntimeSection.tsx: Insight Agent engine select
  • types/index.ts: extend engine union comments
  • buildSendInput.ts: forward customBaseUrl when engine === "vllm"
  • initialSetupApply.ts: KNOWN_ENGINES set
  • locales/{en,ko}/dialog.json: meta_agent.vllm_install_hint

Defaults

  • endpoint: http://localhost:8000 (override via UI or VLLM_ENDPOINT env)
  • optional VLLM_API_KEY (Bearer token if set)

Verified

  • tsc --noEmit: clean
  • cargo check: clean
  • Vitest streaming-flow + metaAgentSelector-modelDiscovery: 29 passed
  • Manual: vLLM visible in meta agent selector

Summary

Related plan / issue

Changes

Test plan

  • npx tsc --noEmit
  • npx vite build
  • cd src-tauri && cargo check
  • npx vitest run
  • cd src-tauri && cargo test --lib
  • Manual smoke test of the affected feature

Invariants touched

Screenshots / logs

Checklist

  • PR title follows Conventional Commits (feat(scope): ...)
  • Tests added or updated
  • Docs updated (plan, README, how-to) if behavior changed
  • No secrets, tokens, or personal paths in the diff

…settings

vLLM uses OpenAI-compatible API, so it routes through the existing
openai_compat.rs path alongside ollama / lmstudio.

Backend (5 files)
- agent_detect.rs: probe_vllm() — GET {endpoint}/v1/models
- openai_compat.rs: vllm_base_url(), discover_vllm(), engine_name +
  per-engine API key (VLLM_API_KEY) routing
- agents.rs: start_openai_compat_stream + run_eval_agent vllm branches
- executor.rs: RT run_participant / stream_participant vllm cases
- model_discovery.rs: ENGINES list + dispatch + fallback_models entry

Frontend (11 files)
- engineConfig.ts / EngineSelector.tsx / AgentAvatar.tsx / index.css:
  register vllm in shared engine registry + agent color token
- CreateRoundtableDialog.tsx: RT participant engine dropdown
- MetaAgentSelector.tsx: onboarding meta agent — vllm endpoint state,
  detect_available_agents wiring, HTTP endpoint input, install hint
- AgentsSection.tsx: agent profile engine dropdown + endpoint override
  (engineEndpoint:vllm setting)
- RuntimeSection.tsx: Insight Agent engine select
- types/index.ts: extend engine union comments
- buildSendInput.ts: forward customBaseUrl when engine === "vllm"
- initialSetupApply.ts: KNOWN_ENGINES set
- locales/{en,ko}/dialog.json: meta_agent.vllm_install_hint

Defaults
- endpoint: http://localhost:8000 (override via UI or VLLM_ENDPOINT env)
- optional VLLM_API_KEY (Bearer token if set)

Verified
- tsc --noEmit: clean
- cargo check: clean
- Vitest streaming-flow + metaAgentSelector-modelDiscovery: 29 passed
- Manual: vLLM visible in meta agent selector

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for vLLM as an OpenAI-compatible engine, integrating it across backend detection, streaming, roundtable execution, and frontend settings. The review comments point out critical routing issues where vLLM non-streaming, streaming, and evaluation requests are incorrectly routed to the Ollama endpoint because they call functions hardcoded to Ollama's base URL. Additionally, a high-severity issue was identified in the vLLM detection probe, which fails to include the VLLM_API_KEY authorization header, potentially causing onboarding failures on authenticated vLLM instances.

"gemini" => (gemini::run(run_input), "gemini"),
"opencode" => (opencode::run(run_input), "opencode"),
"ollama" => (openai_compat::run(run_input), "ollama"),
"vllm" => (openai_compat::run(run_input), "vllm"),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The fallback non-streaming run_participant execution for vllm currently routes through openai_compat::run(run_input), which is hardcoded to use ollama_base_url() (typically http://localhost:11434). This causes vLLM roundtable requests to be incorrectly routed to the Ollama endpoint. We should instead use openai_compat::stream_run_with_base with the correct vllm_base_url() by blocking on the current Tokio runtime handle.

            "vllm" => {
                let base_url = openai_compat::vllm_base_url();
                let res = tokio::runtime::Handle::try_current()
                    .map_err(|_| AppError::Agent("No tokio runtime available for vllm".into()))
                    .and_then(|rt| {
                        rt.block_on(async {
                            openai_compat::stream_run_with_base(run_input, base_url, |_| {}, |_| {}).await
                        })
                    });
                (res, "vllm")
            }

Comment on lines +183 to +184
let label = if engine_key_owned == "vllm" { "vllm" } else { "ollama" };
(openai_compat::stream_run(run_input, on_progress, on_chunk).await, label)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

In stream_participant, the streaming execution for vllm currently calls openai_compat::stream_run(run_input, on_progress, on_chunk), which is hardcoded to use ollama_base_url(). This causes vLLM streaming roundtable requests to be incorrectly routed to the Ollama endpoint. We should instead use openai_compat::stream_run_with_base with the correct base URL for the selected engine.

Suggested change
let label = if engine_key_owned == "vllm" { "vllm" } else { "ollama" };
(openai_compat::stream_run(run_input, on_progress, on_chunk).await, label)
let base_url = if engine_key_owned == "vllm" {
openai_compat::vllm_base_url()
} else {
std::env::var("OLLAMA_HOST").unwrap_or_else(|_| "http://localhost:11434".into())
};
let label = if engine_key_owned == "vllm" { "vllm" } else { "ollama" };
(openai_compat::stream_run_with_base(run_input, base_url, on_progress, on_chunk).await, label)

"gemini" => gemini::run(run_input),
"opencode" => opencode::run(run_input),
"ollama" => openai_compat::run(run_input),
"ollama" | "vllm" => openai_compat::run(run_input),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

In run_eval_agent, the evaluation path for vllm currently routes through openai_compat::run(run_input), which is hardcoded to use ollama_base_url(). This causes vLLM evaluation requests to be incorrectly routed to the Ollama endpoint. We should instead use openai_compat::stream_run_with_base with the correct vllm_base_url() by blocking on the current Tokio runtime handle.

        "ollama" => openai_compat::run(run_input),
        "vllm" => {
            let base_url = openai_compat::vllm_base_url();
            let rt = tokio::runtime::Handle::try_current()
                .map_err(|_| AppError::Agent("No tokio runtime available for vllm".into()))?;
            rt.block_on(async {
                openai_compat::stream_run_with_base(run_input, base_url, |_| {}, |_| {}).await
            })
        }

Comment on lines +274 to +275
eprintln!("[agent-detect] probe vllm: GET {}", url);
match client.get(&url).send().await {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The probe_vllm function does not include the Authorization header with VLLM_API_KEY when sending the probe request. If the vLLM instance requires authentication (which is common for shared or cloud-hosted instances), the detection probe will fail with a 401 Unauthorized status, even if the key is configured in the environment. Adding the Authorization header ensures that authenticated vLLM instances are correctly detected during onboarding.

    eprintln!("[agent-detect] probe vllm: GET {}", url);
    let mut req = client.get(&url);
    if let Ok(token) = std::env::var("VLLM_API_KEY") {
        req = req.header("Authorization", format!("Bearer {}", token));
    }
    match req.send().await {

@hang-in
Copy link
Copy Markdown
Owner

hang-in commented May 28, 2026

@yodakrkids vLLM 6번째 엔진 추가 PR 감사드립니다. tunaFlow 의 openai_compat 라우팅 (ollama / lmstudio 와 같은 layer) 을 그대로 재사용하신 패턴이 합리적입니다. 코드 평가 + CI 통과 + Gemini review 확인 후 머지 진행하겠습니다 — 다음 patch release (v0.1.8-beta-5) 또는 minor release (v0.1.9-beta) 에 포함 예정. CI 가 fork PR 정책으로 메인테이너 approval 대기 중이라 곧 trigger 하겠습니다.

hang-in added a commit that referenced this pull request May 28, 2026
…x (supersedes #296) (#297)

* feat(engines): add vLLM as 6th UI-connected engine — RT, meta agent, settings

vLLM uses OpenAI-compatible API, so it routes through the existing
openai_compat.rs path alongside ollama / lmstudio.

Backend (5 files)
- agent_detect.rs: probe_vllm() — GET {endpoint}/v1/models
- openai_compat.rs: vllm_base_url(), discover_vllm(), engine_name +
  per-engine API key (VLLM_API_KEY) routing
- agents.rs: start_openai_compat_stream + run_eval_agent vllm branches
- executor.rs: RT run_participant / stream_participant vllm cases
- model_discovery.rs: ENGINES list + dispatch + fallback_models entry

Frontend (11 files)
- engineConfig.ts / EngineSelector.tsx / AgentAvatar.tsx / index.css:
  register vllm in shared engine registry + agent color token
- CreateRoundtableDialog.tsx: RT participant engine dropdown
- MetaAgentSelector.tsx: onboarding meta agent — vllm endpoint state,
  detect_available_agents wiring, HTTP endpoint input, install hint
- AgentsSection.tsx: agent profile engine dropdown + endpoint override
  (engineEndpoint:vllm setting)
- RuntimeSection.tsx: Insight Agent engine select
- types/index.ts: extend engine union comments
- buildSendInput.ts: forward customBaseUrl when engine === "vllm"
- initialSetupApply.ts: KNOWN_ENGINES set
- locales/{en,ko}/dialog.json: meta_agent.vllm_install_hint

Defaults
- endpoint: http://localhost:8000 (override via UI or VLLM_ENDPOINT env)
- optional VLLM_API_KEY (Bearer token if set)

Verified
- tsc --noEmit: clean
- cargo check: clean
- Vitest streaming-flow + metaAgentSelector-modelDiscovery: 29 passed
- Manual: vLLM visible in meta agent selector

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(engines): vLLM RT routing — stream_run_with_base + vllm_base_url (gemini critical 1+2)

executor.rs 의 `run_participant` (non-streaming) 와 `stream_participant`
(streaming) 두 분기 모두 vllm 을 `openai_compat::run` / `stream_run` 으로
호출하고 있었음 → 두 함수 모두 내부에서 `ollama_base_url()`
(`OLLAMA_HOST` 기본 localhost:11434) 을 하드코딩 → vLLM 요청이 ollama
서버로 라우팅되는 회귀.

수정: vllm 분기를 별도 분리해 `stream_run_with_base(input, vllm_base_url(),
...)` 패턴으로 호출. ollama 동작은 변경 없음.

- run_participant: spawn_blocking sync 컨텍스트라
  `Handle::current().block_on(...)` 으로 async wrapper 실행
- stream_participant: 이미 async 컨텍스트라 직접 `await`

회귀 가드:
- ollama 분기는 그대로 → 기존 동작 동일
- claude / codex / gemini / opencode 분기 변경 없음

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(engines): vLLM eval routing — run_eval_agent uses vllm_base_url (gemini critical 3)

`run_eval_agent` 의 vllm 분기가 `openai_compat::run` 호출 → 내부 ollama
하드코딩 → vLLM 평가 요청이 localhost:11434 (ollama) 로 라우팅되는 회귀.

수정: vllm 분기를 ollama 와 분리해 `stream_run_with_base(input,
vllm_base_url(), ...)` 패턴으로 호출. `Handle::try_current()` +
`block_on(...)` — `openai_compat::run` 과 동일 패턴 (Tauri sync command 는
tokio runtime 안에서 실행).

회귀 가드:
- ollama / codex / gemini / opencode / claude 분기 변경 없음
- runtime handle 없을 때 명확한 에러 메시지 (silent ollama 호출 차단)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(engines): vLLM probe Authorization header — VLLM_API_KEY (gemini high 4)

`probe_vllm` 가 `client.get(&url).send()` 만 호출 → vLLM 인스턴스가
`--api-key` 옵션으로 보호된 경우 (실제 운영 환경 권장 구성) 401 거부로
detect 실패. Authorization 헤더 누락.

수정: `VLLM_API_KEY` env 가 있고 비어있지 않으면 `Bearer <key>` 헤더 추가.
`openai_compat::discover_vllm` 의 동일 패턴 따름. 헤더 없을 때 (로컬 비보호
인스턴스) 동작은 변경 없음 — 평문 GET 그대로.

회귀 가드:
- env 미설정 시 기존 path 동일
- 다른 probe 함수 (ollama / lmstudio) 변경 없음

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: 임용식 <yoda@krkids.co.kr>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: dghong <d9ng@outlook.com>
@hang-in
Copy link
Copy Markdown
Owner

hang-in commented May 28, 2026

@yodakrkids 안녕하세요. 정말 감사드립니다 — vLLM 6번째 엔진 추가 PR 받았습니다.

본 PR 의 변경분을 그대로 main 으로 가져가면서, Gemini code review 가 지적한 4 건 critical fix 를 함께 적용한 후속 PR #297 을 만들어 머지했습니다:

머지된 PR: #297 (squash commit cf8d8a5)

  • yodakrkids 의 vLLM 추가 commit 은 9fc1a13 으로 cherry-pick — author 보존됨
  • 그 위에 Gemini critical 4 건 fix commit 3 개를 쌓음

적용된 4 fix

  1. executor.rs:55run_participant (non-streaming) vllm 분기가 openai_compat::run 호출 → 내부 ollama_base_url() 하드코딩 회귀. stream_run_with_base(input, vllm_base_url(), ...) + Handle::current().block_on(...) 으로 수정.
  2. executor.rs:184stream_participant (streaming) 동일 회귀. ollama / vllm 분기 분리 후 stream_run_with_base 직접 await.
  3. agents.rs:614run_eval_agent 동일 회귀. Handle::try_current() + block_on(stream_run_with_base(..., vllm_base_url(), ...)).
  4. agent_detect.rs:275probe_vllm 의 Authorization 헤더 누락. VLLM_API_KEY env 있으면 Bearer <key> 헤더 추가.

해결된 충돌
main 의 PR #295 (explicit endpoint detect trigger — Enter / refresh button) 와의 MetaAgentSelector 충돌은 main 의 explicit-trigger 패턴 유지 + vllm 옵션 추가 형태로 해결했습니다. 디바운스 기반 자동 detect 는 외부 사용자 보고 (192.168.1.1 입력 중 . 칠 때마다 발동) 회피를 위해 제거되었습니다.

검증

  • cargo check PASS
  • cargo test --lib — 656 passed / 0 failed
  • tsc --noEmit clean
  • vitest run — 478 passed / 0 failed
  • CI 3 종 (rust-check / frontend-check / eval) 모두 SUCCESS 후 머지

곧 v0.1.9-beta minor release publish 후 release URL + 회복 안내 댓글 별도로 드리겠습니다. 정말 좋은 기여 감사합니다.

@hang-in hang-in closed this May 28, 2026
hang-in pushed a commit that referenced this pull request May 28, 2026
…odakrkids)

매니페스트 4 곳 + Cargo.lock minor bump. CHANGELOG entry 추가.

핵심:
- vLLM 6th UI-connected engine (PR #297, supersedes #296 by yodakrkids)
- Gemini critical 4 fix (executor.rs / agents.rs / agent_detect.rs)
- OpenAI-compatible path 재사용, ollama/lmstudio 동작 변경 0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hang-in
Copy link
Copy Markdown
Owner

hang-in commented May 28, 2026

@yodakrkids vLLM PR 감사드립니다. Gemini code review 가 critical 4 항목 (executor.rs / agents.rs vLLM 분기가 ollama base URL 로 잘못 라우팅 + probe_vllm Authorization 누락) 지적해 주셨고, 본 PR 의 commit 들 위에 메인테이너가 4 fix follow-up commit 추가하여 PR #297 로 supersede 머지했습니다.

머지 commit: cf8d8a5 (PR #297)
Release: https://github.com/hang-in/tunaFlow/releases/tag/v0.1.9-beta (publish 후 URL)

본 PR 은 PR #297 로 supersede 됐으니 close 합니다. vLLM 외 다른 영역 PR 도 환영합니다 — 본 PR 의 6th engine 추가 작업 덕분에 minor bump (v0.1.8 → v0.1.9) 가 가능했습니다. 감사합니다.

@hang-in
Copy link
Copy Markdown
Owner

hang-in commented May 28, 2026

v0.1.9-beta publish 완료 — https://github.com/hang-in/tunaFlow/releases/tag/v0.1.9-beta

@yodakrkids 의 vLLM 6th engine 추가 (merge cf8d8a5 via PR #297) 가 첫 외부 contributor feature release 입니다. macOS DMG / Windows installer 자산 모두 빌드 완료. 다시 한 번 감사드립니다.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants