Vehicle Intelligence ingests a 360° walkaround video of a vehicle and returns a structured inspection report:
- Identity — vehicle type, brand, model, category, year, trim, evidence sources
- Odometer — reading from the dashboard (OCR + VLM chain)
- Damage — scratches, dents, rust, cracks, paint damage, broken lights, wheel damage, panel misalignment — each grounded to a body panel and tagged with an estimated repair cost and a per-detection rationale
- Exhaust — stock vs modified classifier
- PDF report — generated client-side from the inspection JSON
It also ships an active-learning loop: every damage detection can be confirmed or rejected from the UI; reviewer feedback exports to a YOLO-format training set with a single script.
Three services, each runnable on its own:
| Service | Stack | Port | Owns |
|---|---|---|---|
frontend/ |
Next.js 16, React 19, Tailwind 4 | 3000 | UI, uploads, polling, PDF rendering, reviewer queue |
backend/ |
Node, Express, TypeScript, SQLite (better-sqlite3) |
3001 | Persistence, upload pipeline, job orchestration, static file gating |
ml-service/ |
Python, FastAPI | 8000 | All model inference — YOLOv8, CLIP, PaddleOCR, Gemini, OpenAI vision |
Shared TypeScript types live in shared/types.ts.
- Frontend optionally calls
POST /api/upload/preflightto gate on blur, brightness, and vehicle presence before the user commits to a full upload. POST /api/uploadwrites the video tobackend/uploads/videos/, inserts afilesrow and ajobsrow (statuspending), returnsjobId.services/job_processor.tsruns the job in-process (no queue) and POSTs the absolute video path to ML/api/processwith retry/backoff.- ML pipeline (
src/api/process.py):FrameExtractor→VehicleIdentifier(CLIP) →DashboardDetector+OdometerReader(YOLO + PaddleOCR + VLM) →DamageDetector→panel_inference.attach_parts_to_locations→repair_costs.estimate_repair_costs→damage_rationale.attach_rationales(best-effort, batched VLM call) →ExhaustClassifier→ReportGenerator. Models load once at startup viaModelRegistry. - Backend persists the result into
inspectionsand flips the job tocompleted. Frontend pollsGET /api/jobs/:id, then fetchesGET /api/inspections/:id.
A startup reaper plus a 5-minute interval reaper marks stuck jobs as failed so
the UI doesn't hang on rows nobody is processing. A 6-hour sweeper deletes raw
videos for completed jobs older than VIDEO_RETENTION_DAYS.
The fast path runs all three services with port-clearing, venv setup, and dependency install handled:
./START_SERVICES.shLogs land in /tmp/vi-{backend,ml-service,frontend}.log. Ctrl+C kills all
three. Per-service commands below if you'd rather drive them yourself.
cd backend
npm install
npm run dev # tsx watch, port 3001
npm run build # tsc → dist/
npm run type-check
npm run lintcd ml-service
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Vision LLM keys (both optional; the pipeline degrades gracefully).
# Either ml-service/.env or the repo-root .env is read.
export GEMINI_API_KEY=...
export OPENAI_API_KEY=...
export OPENAI_BASE_URL=http://localhost:11434/v1 # optional, OpenAI-compatible
python src/main.py # uvicorn, port 8000
pytest tests/cd frontend
npm install
npm run dev # next dev, port 3000
npm run build
npm run lint
npm test # jest (jsdom)Copy .env.example to .env at the repo root for Docker Compose.
| Service | Key | Default | Purpose |
|---|---|---|---|
| backend | PORT |
3001 | Server port |
| backend | ML_SERVICE_URL |
http://localhost:8000 |
ML service base URL |
| backend | DATABASE_PATH |
./data/vehicle_intelligence.db |
SQLite file path |
| backend | UPLOAD_MAX_SIZE |
500MB |
Per-file upload limit |
| backend | CORS_ALLOWED_ORIGINS |
http://localhost:3000,http://localhost:3001 |
Comma-separated |
| backend | RATE_LIMIT_WINDOW_MS / RATE_LIMIT_MAX_REQUESTS |
15min / 100 | Express rate limit |
| backend | ML_SERVICE_TIMEOUT_MS |
600000 | Outer ML axios timeout |
| backend | VIDEO_RETENTION_DAYS |
7 | Sweeper threshold |
| backend | LOG_LEVEL |
info |
pino level |
| ml-service | GEMINI_API_KEY |
— | Primary VLM |
| ml-service | OPENAI_API_KEY |
— | VLM fallback |
| ml-service | OPENAI_BASE_URL |
OpenAI | OpenAI-compatible endpoint |
| ml-service | ML_DEVICE |
auto | cuda / mps / cpu override |
| ml-service | ML_STAGE_TIMEOUT_VEHICLE / _ODOMETER / _DAMAGE / _EXHAUST / _GEMINI |
— | Per-stage soft timeouts (seconds) |
| ml-service | ML_DAMAGE_RATIONALE_TIMEOUT |
— | Cap on rationale VLM batch |
| frontend | NEXT_PUBLIC_API_URL |
http://localhost:3001/api |
Backend API |
| frontend | BACKEND_URL |
http://localhost:3001 |
Where /uploads/* is proxied to |
Inspection lifecycle
POST /api/upload/preflight— multipart video, returns blur/brightness/vehicle-presence diagnostics. Fails open on ML errors.POST /api/upload— multipart video (+ optional odometer image, identity fields). Returns{ jobId, fileId }.GET /api/jobs/:id— job status.GET /api/inspections— paginated list.GET /api/inspections/:id— full inspection.PUT /api/inspections/:id/identity— merge trusted identity evidence (VIN, registration, brand, model, year, variant).PUT /api/inspections/:id/vlm— merge externally generated VLM evidence.POST /api/inspections/:id/retry-vlm— rerun VLM from saved organized frames.
Active-learning feedback
POST /api/inspections/:id/feedback— confirm / reject / wrong-type a detection.GET /api/inspections/:id/feedback— list feedback for one inspection.DELETE /api/inspections/:id/feedback/:fid— remove one feedback row.POST /api/inspections/:id/missing-damage— reviewer-drawn bbox for a damage the model missed.GET /api/inspections/:id/missing-damage/DELETE …/:mid— list / remove.GET /api/feedback/export?since=ISO— joined export of all feedback + missing-damage rows.GET /api/feedback/review?limit=N— uncertain detections (confidence closest to 0.5) for the reviewer queue.
Other
GET /api/metrics— dashboard aggregates.GET /health— liveness./uploads/frames/*and/uploads/odometer_images/*— guarded static serving. Raw/uploads/videos/*is 403 by design.
POST /api/preflight— sample 12 frames, return blur + brightness + vehicle-presence scores.POST /api/process— full inspection pipeline.POST /api/retry-vlm— VLM-only rerun from saved organized frames.GET /health— liveness.GET /ready— dependency readiness. Pass?live_gemini=true&live_openai=trueto verify VLM quota/keys.
/— inspection dashboard (volumes, confidence, recent inspections)./inspect— upload form. Runs pre-flight, thenPOST /api/upload./capture— guided 8-stage walkaround recorder usingMediaRecorder+getUserMedia. Samples brightness and blur every 500 ms; requires thePermissions-Policy: camera=(self)header innext.config.js./job/[id]— job status polling with exponential backoff./inspection/[id]— full report with part-grouped damage accordion, repair cost totals, per-snapshot 👍/👎 feedback, JSON + PDF download./review— reviewer queue of the most-uncertain detections across all inspections./history— paginated list of past inspections.
# Backend — export reviewer feedback as a YOLO-format training set.
cd backend
npx tsx scripts/export-training-set.ts --out ./training-set [--since 2026-01-01]
# Produces images/, labels/, classes.txt, manifest.json. Idempotent.
# ML — pipeline readiness and per-video completion audit.
cd ml-service
python scripts/check_pipeline_readiness.py --live-gemini --live-openai --json > /tmp/vip-readiness.json
python scripts/evaluate_video_understanding.py ../360.mov --with-models --read-odometer \
--output-dir /tmp/vip-video-eval
python scripts/audit_pipeline_completion.py \
--manifest /tmp/vip-video-eval/frame_analysis_manifest.json \
--inspection-json /path/to/process_response.json \
--readiness-json /tmp/vip-readiness.json
# ML — retry VLM step from saved organized frames after fixing quota.
python scripts/retry_vlm_analysis.py \
--inspection-json /path/to/process_response_or_backend_inspection.json \
--output-json /tmp/vip-vlm-retry.json \
--merged-output-json /tmp/vip-process-response-with-vlm.json- The backend uses synchronous
better-sqlite3— noawaiton DB calls. - Job processing is in-process. Don't move ML work into the backend; don't rely on the backend keeping ML state across restarts.
ModelRegistry.initialize_all_models()runs at FastAPI startup. If startup fails the service refuses to start — don't catch and continue.- The frontend talks to the backend via
NEXT_PUBLIC_API_URL. The frontend never calls the ML service directly./uploads/*is proxied through Next's rewrites to keepnext/imagehappy without remote whitelisting. JobStatusand other enums inshared/types.tsmust match the strings used inbackend/src/db/schema.sqlandmodels/inspection.ts.- Static file allow-list is in
backend/src/index.ts. Adding a new output directory? Add its prefix toallowedPrefixesor it will be blocked.
Honest list of production gaps:
- No authentication. Anyone with the URL can upload, read, and export.
- In-process job orchestration — no queue, no horizontal scale, restarts abandon running jobs (the reaper marks them as failed).
- SQLite as the database — single-writer, no replication, no PITR. Fine for an MVP, not for production.
- No CI. Test suites exist; nothing runs them on push.
- Local-filesystem uploads — won't survive a container restart or scale across replicas. Move to S3/GCS for production.
- Multer MIME check only — no magic-byte verification, no AV scan.
Frontend — Next.js 16 (App Router, Turbopack), React 19, Tailwind 4,
shadcn/radix primitives, @react-pdf/renderer.
Backend — Node, Express, TypeScript, better-sqlite3, Zod, multer,
helmet, pino.
ML service — Python 3.10+, FastAPI, OpenCV, YOLOv8 (ultralytics), CLIP, PaddleOCR, Tesseract, Google Gemini, OpenAI vision.
このシステムは、車両の360度ウォークアラウンド動画を取り込み、構造化された検査レポートを返します。
- 識別 — 車種、ブランド、モデル、カテゴリ、年式、トリム、証拠の出所
- 走行距離 — ダッシュボードからの読み取り(OCR + VLM チェーン)
- 損傷 — 傷、へこみ、錆、ひび、塗装ダメージ、ライト破損、ホイール損傷、パネルずれ。検出ごとに車体パネルへ紐付け、推定修理費と根拠コメントを付与
- 排気 — 純正 / 改造の分類
- PDF レポート — 検査 JSON からクライアント側で生成
加えて アクティブラーニングのループ を備えます。UI 上で各損傷検出を確定/却下でき、レビュー結果は1コマンドで YOLO 形式の学習データとしてエクスポートできます。
3 サービス構成。それぞれ単独で起動可能です。
| サービス | スタック | ポート | 担当 |
|---|---|---|---|
frontend/ |
Next.js 16, React 19, Tailwind 4 | 3000 | UI、アップロード、ポーリング、PDF 生成、レビュー画面 |
backend/ |
Node, Express, TypeScript, SQLite | 3001 | 永続化、アップロード制御、ジョブ管理、静的ファイル制御 |
ml-service/ |
Python, FastAPI | 8000 | YOLOv8, CLIP, PaddleOCR, Gemini, OpenAI ビジョンの推論 |
共有 TypeScript 型は shared/types.ts。
- フロントエンドが任意で
POST /api/upload/preflightを呼び、ぼかし・明るさ・車両検出の品質を事前判定。 POST /api/uploadが動画をbackend/uploads/videos/に保存し、files/jobsレコードを作成しjobIdを返す。services/job_processor.tsが インプロセス でジョブを実行し、絶対パスを ML/api/processに POST(リトライ/バックオフあり)。- ML パイプライン(
src/api/process.py):FrameExtractor→VehicleIdentifier(CLIP)→DashboardDetector+OdometerReader(YOLO + PaddleOCR + VLM)→DamageDetector→panel_inference.attach_parts_to_locations→repair_costs.estimate_repair_costs→damage_rationale.attach_rationales(バッチ VLM、ベストエフォート)→ExhaustClassifier→ReportGenerator。 モデルは起動時にModelRegistryで一度だけロード。 - バックエンドが結果を
inspectionsテーブルへ書き込み、ジョブをcompletedに更新。フロントはGET /api/jobs/:id→GET /api/inspections/:idの順でポーリング。
起動時と5分間隔の reaper がスタックしたジョブを failed 化。6時間ごとの sweeper が VIDEO_RETENTION_DAYS を超えた動画を削除します。
3 サービスを一括起動(ポート空け、venv 作成、依存解決まで自動):
./START_SERVICES.shログは /tmp/vi-{backend,ml-service,frontend}.log。Ctrl+C で全停止。
個別の起動コマンドは英語版を参照してください(npm run dev / python src/main.py など)。
主要な値は英語版の表を参照してください。Docker Compose を使う場合はリポジトリルートの .env(.env.example をコピー)を読み込みます。
検査ライフサイクル
POST /api/upload/preflight、POST /api/upload、GET /api/jobs/:id、GET /api/inspections、GET /api/inspections/:id、PUT /api/inspections/:id/identity、PUT /api/inspections/:id/vlm、POST /api/inspections/:id/retry-vlm
アクティブラーニング
POST/GET/DELETE /api/inspections/:id/feedbackPOST/GET/DELETE /api/inspections/:id/missing-damageGET /api/feedback/export?since=ISO、GET /api/feedback/review?limit=N
その他
GET /api/metrics、GET /health、/uploads/frames/*と/uploads/odometer_images/*(/uploads/videos/*は 403)。
POST /api/preflight、POST /api/process、POST /api/retry-vlm、GET /health、GET /ready
/ダッシュボード/inspectアップロード(pre-flight 経由)/capture8 ステージのガイド付き撮影(MediaRecorder+getUserMedia)/job/[id]ジョブステータス(指数バックオフのポーリング)/inspection/[id]レポート表示(パネル別損傷、推定修理費、👍/👎、JSON / PDF ダウンロード)/review未確定検出のレビュー画面/history過去の検査一覧
# レビュー結果を YOLO 学習データとして書き出し(冪等)
cd backend
npx tsx scripts/export-training-set.ts --out ./training-set [--since 2026-01-01]
# パイプライン readiness と動画ごとの完了監査
cd ml-service
python scripts/check_pipeline_readiness.py --live-gemini --live-openai --json > /tmp/vip-readiness.json
python scripts/evaluate_video_understanding.py ../360.mov --with-models --read-odometer \
--output-dir /tmp/vip-video-eval
python scripts/audit_pipeline_completion.py \
--manifest /tmp/vip-video-eval/frame_analysis_manifest.json \
--inspection-json /path/to/process_response.json \
--readiness-json /tmp/vip-readiness.json- 認証なし。URL を知るだけでアップロード、閲覧、エクスポートが可能。
- ジョブはバックエンド内プロセスで実行。再起動すると進行中ジョブは reaper により失敗扱い。
- SQLite を使用。書き込みは単一、レプリケーションも PITR もなし。
- CI 未整備。テストは存在するが push 時に実行されない。
- アップロードはローカルファイルシステム保存。コンテナ再起動・水平スケールに耐えない(S3/GCS 推奨)。
- アップロードファイルの検証は multer の MIME チェックのみ。マジックバイト確認やウイルススキャンなし。