Skip to content

Commit 92dfd20

Browse files
Feat vision provider routing codex validation (#1)
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added support for multiple metadata analysis providers: Claude, Codex, and HTTP-based Vision APIs * Introduced `--analysis-context` and `--analysis-context-file` options for enriching image metadata analysis with custom guidance * Implemented scene-aware analysis that intelligently processes UI and illustration content * **Improvements** * Enhanced metadata validation with detailed diagnostic feedback for recognition failures * Added retry logic for recoverable validation issues to improve success rates * Improved error messages with context-specific guidance <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Hagicode <noreply@hagicode.com>
1 parent fa7a6d6 commit 92dfd20

30 files changed

Lines changed: 2193 additions & 153 deletions

.env.example

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,36 @@ IMGBIN_IMAGE_API_KEY=
44
IMGBIN_IMAGE_API_MODEL=
55
IMGBIN_IMAGE_API_TIMEOUT_MS=60000
66

7-
# Local Claude CLI metadata analysis.
7+
# Metadata analysis provider routing.
8+
# Supported values: claude, codex, http
9+
IMGBIN_ANALYSIS_PROVIDER=claude
10+
IMGBIN_ANALYSIS_TIMEOUT_MS=60000
11+
# Optional override for the bundled prompt file in prompts/default-analysis-prompt.txt
12+
IMGBIN_ANALYSIS_PROMPT_PATH=
13+
14+
# Claude analysis provider.
815
# Optional if `claude` is already available in PATH.
916
IMGBIN_ANALYSIS_CLI_PATH=claude
17+
IMGBIN_CLAUDE_CLI_PATH=claude
1018
# Preferred: ImgBin-specific model override.
1119
IMGBIN_ANALYSIS_API_MODEL=
20+
IMGBIN_CLAUDE_MODEL=
1221
# Fallback: shared Claude/Anthropic model setting.
1322
ANTHROPIC_MODEL=
14-
IMGBIN_ANALYSIS_TIMEOUT_MS=60000
15-
# Optional override for the bundled prompt file in prompts/default-analysis-prompt.txt
16-
IMGBIN_ANALYSIS_PROMPT_PATH=
23+
IMGBIN_CLAUDE_TIMEOUT_MS=60000
24+
25+
# Codex analysis provider.
26+
IMGBIN_CODEX_CLI_PATH=codex
27+
IMGBIN_CODEX_MODEL=
28+
IMGBIN_CODEX_TIMEOUT_MS=60000
29+
IMGBIN_CODEX_BASE_URL=
30+
IMGBIN_CODEX_API_KEY=
31+
32+
# HTTP analysis provider.
33+
IMGBIN_VISION_API_URL=
34+
IMGBIN_VISION_API_KEY=
35+
IMGBIN_VISION_API_MODEL=
36+
IMGBIN_VISION_API_TIMEOUT_MS=60000
1737

1838
IMGBIN_DEFAULT_OUTPUT_DIR=./library
1939
IMGBIN_THUMBNAIL_SIZE=512

README.md

Lines changed: 129 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# ImgBin
22

3-
ImgBin is a TypeScript CLI for generating image assets, importing existing images into a managed library, writing searchable metadata, searching managed libraries, creating thumbnails, and running local Claude CLI image metadata analysis.
3+
ImgBin is a TypeScript CLI for generating image assets, importing existing images into a managed library, writing searchable metadata, searching managed libraries, creating thumbnails, and running provider-routed multimodal image metadata analysis.
44

55
## Requirements
66

77
- Node.js 20+
88
- Access to an image generation HTTP API for `generate`
9-
- A local `claude` CLI installation for `annotate` / `--annotate`
9+
- One configured analysis backend for `annotate` / `--annotate`: `claude`, `codex`, or a compatible HTTP vision API
1010

1111
## Installation
1212

@@ -95,8 +95,8 @@ ImgBin now matches the Azure image-generation request format that was previously
9595
That means the current recommended setup is:
9696

9797
1. configure Azure image generation for output,
98-
2. configure a local metadata-analysis model for the Claude-compatible CLI step, and
99-
3. run `imgbin generate` directly or call it through the site wrapper.
98+
2. configure a non-interactive multimodal analysis provider (`claude`, `codex`, or `http`), and
99+
3. run `imgbin generate` directly or call it through the site wrapper or CI automation.
100100

101101
### Minimal `.env` example
102102

@@ -111,20 +111,41 @@ IMGBIN_IMAGE_API_KEY="<azure-api-key>"
111111
# AZURE_ENDPOINT="https://<resource>.openai.azure.com/openai/deployments/<deployment>/images/generations?api-version=<version>"
112112
# AZURE_API_KEY="<azure-api-key>"
113113

114-
# Metadata analysis model for the local Claude-compatible CLI
115-
IMGBIN_ANALYSIS_API_MODEL="glm-5"
114+
# Select one metadata analysis backend
115+
IMGBIN_ANALYSIS_PROVIDER="codex"
116+
117+
# Codex multimodal analysis
118+
IMGBIN_CODEX_CLI_PATH="codex"
119+
IMGBIN_CODEX_MODEL="gpt-5-codex"
120+
# Optional if Codex is already configured globally
121+
# IMGBIN_CODEX_BASE_URL="https://api.openai.com/v1"
122+
# IMGBIN_CODEX_API_KEY="<codex-api-key>"
123+
124+
# Claude-compatible analysis remains available
125+
# IMGBIN_ANALYSIS_PROVIDER="claude"
126+
# IMGBIN_ANALYSIS_CLI_PATH="claude"
127+
# IMGBIN_ANALYSIS_API_MODEL="glm-5"
128+
# ANTHROPIC_MODEL="glm-5"
129+
130+
# Or route analysis through a compatible HTTP vision endpoint
131+
# IMGBIN_ANALYSIS_PROVIDER="http"
132+
# IMGBIN_VISION_API_URL="https://example.com/vision"
133+
# IMGBIN_VISION_API_KEY="<vision-api-key>"
134+
# IMGBIN_VISION_API_MODEL="vision-model"
116135

117136
# Optional runtime tuning
118137
IMGBIN_DEFAULT_OUTPUT_DIR="./library"
119138
IMGBIN_IMAGE_API_TIMEOUT_MS="60000"
139+
IMGBIN_ANALYSIS_TIMEOUT_MS="60000"
120140
```
121141

122142
Notes:
123143

124144
- `IMGBIN_IMAGE_API_URL` / `IMGBIN_IMAGE_API_KEY` are the preferred names.
125145
- `AZURE_ENDPOINT` / `AZURE_API_KEY` are accepted as compatibility fallbacks.
126146
- GPT Image is used only for image generation.
127-
- Metadata still comes from the local Claude-compatible analysis step.
147+
- `IMGBIN_ANALYSIS_PROVIDER` defaults to `claude` for backward compatibility when omitted.
148+
- All three analysis backends share the same scene-aware prompt builder, validation rules, and metadata provenance fields.
128149

129150
## Environment Variables
130151

@@ -137,18 +158,40 @@ Notes:
137158
- `AZURE_ENDPOINT`: compatibility fallback for `IMGBIN_IMAGE_API_URL`
138159
- `AZURE_API_KEY`: compatibility fallback for `IMGBIN_IMAGE_API_KEY`
139160

140-
### Local Claude CLI analysis
161+
### Multimodal analysis routing
162+
163+
- `IMGBIN_ANALYSIS_PROVIDER`: selects `claude`, `codex`, or `http`; defaults to `claude`
164+
- `IMGBIN_ANALYSIS_PROMPT_PATH`: optional override for the bundled default prompt file
165+
- `IMGBIN_ANALYSIS_TIMEOUT_MS`: shared timeout fallback for analysis providers, defaults to `60000`
166+
167+
### Claude CLI analysis
141168

142169
- `IMGBIN_ANALYSIS_CLI_PATH`: optional local Claude executable path; defaults to `claude`
170+
- `IMGBIN_CLAUDE_CLI_PATH`: explicit alias for the Claude executable path
143171
- `IMGBIN_ANALYSIS_API_MODEL`: preferred model identifier for ImgBin's local Claude analysis
172+
- `IMGBIN_CLAUDE_MODEL`: explicit alias for the Claude model identifier
144173
- `ANTHROPIC_MODEL`: fallback shared Claude model identifier used when `IMGBIN_ANALYSIS_API_MODEL` is not set
145-
- `IMGBIN_ANALYSIS_TIMEOUT_MS`: optional timeout override for the local Claude process, defaults to `60000`
146-
- `IMGBIN_ANALYSIS_PROMPT_PATH`: optional override for the bundled default prompt file
174+
- `IMGBIN_CLAUDE_TIMEOUT_MS`: optional timeout override for the local Claude process
147175

148176
If `IMGBIN_ANALYSIS_PROMPT_PATH` is not set, ImgBin falls back to `prompts/default-analysis-prompt.txt`.
149177
If `IMGBIN_ANALYSIS_API_MODEL` is empty, ImgBin falls back to `ANTHROPIC_MODEL`.
150178

151-
ImgBin also adds a runtime filename-guidance block to every Claude analysis request. Imported assets prefer the original source filename, generated assets fall back to the managed slug, and placeholder names such as `original.png` or `asset` are ignored. The filename is treated as a soft scene hint only; visible image evidence still wins when they disagree.
179+
### Codex CLI analysis
180+
181+
- `IMGBIN_CODEX_CLI_PATH`: optional Codex executable path; defaults to `codex`
182+
- `IMGBIN_CODEX_MODEL`: optional Codex model identifier
183+
- `IMGBIN_CODEX_TIMEOUT_MS`: optional timeout override for the Codex process
184+
- `IMGBIN_CODEX_BASE_URL`: optional base URL override forwarded as `OPENAI_BASE_URL`
185+
- `IMGBIN_CODEX_API_KEY`: optional API key override forwarded as `CODEX_API_KEY`
186+
187+
### HTTP vision analysis
188+
189+
- `IMGBIN_VISION_API_URL`: required when `IMGBIN_ANALYSIS_PROVIDER=http`
190+
- `IMGBIN_VISION_API_KEY`: optional bearer token for the HTTP vision API
191+
- `IMGBIN_VISION_API_MODEL`: optional model identifier stored in metadata
192+
- `IMGBIN_VISION_API_TIMEOUT_MS`: optional timeout override for the HTTP vision API
193+
194+
ImgBin appends runtime scene profiles and filename guidance to every CLI-based analysis request. Imported assets prefer the original source filename, generated assets fall back to the managed slug, and placeholder names such as `original.png` or `asset` are ignored. The filename remains a soft hint only; visible image evidence still wins when they disagree.
152195

153196
### General runtime
154197

@@ -159,21 +202,23 @@ ImgBin also adds a runtime filename-guidance block to every Claude analysis requ
159202

160203
## Unified workflows
161204

162-
### Generate one image with Azure + metadata analysis
205+
### Generate one image with Azure + multimodal metadata analysis
163206

164207
```bash
165208
imgbin generate \
166209
--prompt "A cheerful hand-drawn hero illustration of an AI coding assistant helping a developer at a desk." \
167210
--output ./library \
211+
--analysis-context "This is a documentation hero illustration with a desk scene and AI assistant visual motif." \
168212
--annotate
169213
```
170214

171215
What happens:
172216

173217
1. ImgBin sends an Azure-style image request,
174218
2. writes the generated file into a managed asset directory,
175-
3. runs the local Claude-compatible metadata analysis step, and
176-
4. stores structured metadata in `metadata.json`.
219+
3. routes multimodal analysis through the configured provider,
220+
4. validates the returned JSON before accepting it, and
221+
5. stores structured metadata plus provider provenance in `metadata.json`.
177222

178223
### Generate from raw prompt text
179224

@@ -183,6 +228,7 @@ imgbin generate \
183228
--output ./library \
184229
--tag dashboard \
185230
--tag hero \
231+
--analysis-context "This is a docs hero image that mixes product-dashboard cues with illustration styling." \
186232
--annotate \
187233
--thumbnail
188234
```
@@ -193,6 +239,7 @@ imgbin generate \
193239
imgbin generate \
194240
--prompt-file ../docs/src/content/docs/img/product-overview/value-proposition-ai-assisted-coding/prompt.json \
195241
--output ./library \
242+
--analysis-context "This prompt file generates a documentation hero asset with interface-inspired card layout." \
196243
--annotate
197244
```
198245

@@ -203,32 +250,60 @@ ImgBin reads the docs prompt file, extracts `userPrompt`, carries over generatio
203250
If image generation already succeeded and you only want to refresh title/tags/description:
204251

205252
```bash
206-
imgbin annotate ./library/2026-03/orange-dashboard-hero --overwrite
253+
imgbin annotate ./library/2026-03/orange-dashboard-hero \
254+
--analysis-context "This is a product dashboard screenshot used in docs." \
255+
--overwrite
207256
```
208257

209-
This is useful after changing `IMGBIN_ANALYSIS_API_MODEL` or updating the analysis prompt.
258+
This is useful after changing the configured analysis provider, model, or prompt.
210259

211260
### Filename-guided analysis
212261

213-
ImgBin now enriches Claude metadata analysis with a lightweight filename hint:
262+
ImgBin now enriches multimodal metadata analysis with a lightweight filename hint:
214263

215264
- imported assets prefer the source filename from `source.originalPath`,
216265
- generated assets fall back to the managed asset slug or directory name, and
217266
- placeholder names such as `original.jpg` or `asset` are skipped automatically.
218267

219268
This guidance is appended at runtime, so it applies to both the bundled default prompt and any `--analysis-prompt` override. Treat it as a soft hint: if the filename conflicts with the image itself, the visible image content should take precedence.
220269

270+
### Non-interactive provider examples
271+
272+
Codex in CI:
273+
274+
```bash
275+
IMGBIN_ANALYSIS_PROVIDER=codex \
276+
IMGBIN_CODEX_CLI_PATH=codex \
277+
IMGBIN_CODEX_MODEL=gpt-5-codex \
278+
imgbin annotate ./library/2026-03/orange-dashboard-hero \
279+
--analysis-context "This is a product dashboard screenshot used in CI validation." \
280+
--overwrite
281+
```
282+
283+
HTTP provider in automation:
284+
285+
```bash
286+
IMGBIN_ANALYSIS_PROVIDER=http \
287+
IMGBIN_VISION_API_URL=https://example.com/vision \
288+
IMGBIN_VISION_API_KEY=token \
289+
imgbin annotate ./library/2026-03/orange-dashboard-hero \
290+
--analysis-context "This is a product dashboard screenshot used in automation." \
291+
--overwrite
292+
```
293+
221294
### Annotate an existing managed asset
222295

223296
```bash
224-
imgbin annotate ./library/2026-03/orange-dashboard-hero
297+
imgbin annotate ./library/2026-03/orange-dashboard-hero \
298+
--analysis-context "This is a product dashboard screenshot with KPI cards and navigation."
225299
```
226300

227301
### Import a standalone image into the library, then analyze it
228302

229303
```bash
230304
imgbin annotate ./incoming/launch-hero.png \
231305
--import-to ./library \
306+
--analysis-context "This is a launch hero visual combining marketing illustration and interface framing." \
232307
--tag imported \
233308
--thumbnail
234309
```
@@ -240,9 +315,30 @@ This copies the source image into a new managed asset directory before writing `
240315
```bash
241316
imgbin annotate ./library/2026-03/orange-dashboard-hero \
242317
--analysis-prompt ./prompts/custom-analysis-prompt.txt \
318+
--analysis-context "This is a product dashboard screenshot used for launch documentation." \
319+
--overwrite
320+
```
321+
322+
### Add custom analysis context for tricky screenshots
323+
324+
Image recognition now requires analysis context. Pass a short project-aware hint so ImgBin can classify tricky screenshots more accurately while still prioritizing visible image evidence.
325+
326+
```bash
327+
imgbin annotate ./library/2026-03/adventure-squad \
328+
--analysis-context "这是冒险团副本管理页面,重点识别副本配置、队伍编成、已分配英雄和右侧编辑器面板。" \
329+
--overwrite
330+
```
331+
332+
You can also store that context in a file:
333+
334+
```bash
335+
imgbin annotate ./library/2026-03/adventure-squad \
336+
--analysis-context-file ./prompts/adventure-squad-context.txt \
243337
--overwrite
244338
```
245339

340+
`annotate`, `generate --annotate`, and any batch job that performs recognition must provide either `--analysis-context` or `--analysis-context-file` (or the manifest equivalents `analysisContext` / `analysisContextFile`).
341+
246342
### Generate or refresh a thumbnail
247343

248344
```bash
@@ -295,10 +391,14 @@ ImgBin stores the reusable search index at `.imgbin/search-index.json` under the
295391
imgbin batch --manifest ./jobs/launch.yaml --output ./library
296392
```
297393

394+
Every manifest job that performs recognition must include `analysisContext` or `analysisContextFile`.
395+
298396
### Batch-process assets whose analysis is still pending or failed
299397

300398
```bash
301-
imgbin batch --pending-library ./library
399+
imgbin batch \
400+
--pending-library ./library \
401+
--analysis-context-file ./prompts/pending-library-context.txt
302402
```
303403

304404
## Batch manifest examples
@@ -309,18 +409,22 @@ jobs:
309409
slug: docs-ai-assisted-coding
310410
tags: [docs, hero]
311411
annotate: true
412+
analysisContext: This is a product hero illustration used in documentation.
312413
thumbnail: true
313414

314415
- assetPath: ./incoming/marketing-card.png
315416
importTo: ./library
417+
analysisContext: This is a marketing image with product-card framing and interface accents.
316418
tags: [marketing, imported]
317419
thumbnail: true
318420

319421
- assetPath: ./library/2026-03/existing-card
320422
overwriteRecognition: true
321423
analysisPromptPath: ./prompts/custom-analysis-prompt.txt
424+
analysisContextFile: ./prompts/existing-card-context.txt
322425

323426
- pendingLibrary: ./library
427+
analysisContextFile: ./prompts/pending-library-context.txt
324428
```
325429
326430
## Metadata model
@@ -329,7 +433,7 @@ Each asset directory stores a `metadata.json` file with these high-level section
329433

330434
- `source`: whether the asset was generated by ImgBin or imported from an external file
331435
- `generated`: prompt text, provider context, docs prompt provenance, and generation params
332-
- `recognized`: local Claude analysis suggestions plus prompt provenance
436+
- `recognized`: multimodal analysis suggestions, provider provenance, validator diagnostics, retry history, and optional custom context provenance
333437
- `manual`: human-maintained title, tags, or description that take precedence by default
334438
- `status`: per-step status for generation, recognition, and thumbnail creation
335439
- `paths`: relative file paths for original and thumbnail assets
@@ -344,14 +448,14 @@ The actual generated image is stored as the asset file on disk, while metadata k
344448

345449
## Notes on analysis behavior
346450

347-
ImgBin does not call a remote Claude URL for metadata analysis. Instead, it:
451+
ImgBin routes metadata analysis through the configured provider. For CLI-based providers, it:
348452

349453
1. loads the bundled or overridden analysis prompt,
350-
2. passes the selected model to the local `claude` CLI,
351-
3. asks Claude to inspect the local image file directly from disk, and
352-
4. parses the returned JSON into metadata fields.
454+
2. appends scene-aware guidance and filename hints at runtime,
455+
3. asks the selected provider to inspect the local image directly (`claude` by path, `codex` by `--image`, or HTTP by base64 payload), and
456+
4. validates the returned JSON before merging it into metadata.
353457

354-
That means the only required Claude-side runtime setting is a usable model name plus a working local `claude` command.
458+
That means non-interactive runs only need a deterministic provider selection plus the corresponding CLI/API configuration.
355459

356460
## Notes on image provider requests
357461

prompts/default-analysis-prompt.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
You are a Claude-compatible image metadata analyst.
1+
You are a multimodal image metadata analyst.
22
Analyze the provided image and return strict JSON with this shape:
33
{
44
"title": "short human-friendly title",
@@ -11,5 +11,9 @@ Rules:
1111
- Use 2 to 8 lowercase kebab-case tags.
1212
- Keep the title under 80 characters.
1313
- Keep the description under 200 characters.
14-
- Filenames may be provided as auxiliary scene clues, but visible image evidence always takes precedence.
14+
- Ground every claim in visible image evidence first. Filenames, paths, slugs, prompts, and scene hints are only auxiliary clues.
15+
- If visible image evidence conflicts with filenames or prompt hints, trust the image.
16+
- For UI, admin dashboards, wireframes, product screenshots, game editors, and other interface-heavy images, prioritize layout, panels, controls, charts, forms, menus, and page purpose.
17+
- Do not hallucinate specific characters, skins, franchises, or IP names for UI-first screenshots unless that text or branding is directly visible in the image itself.
18+
- For mixed illustration plus interface scenes, preserve both semantics in the title, tags, and description.
1519
- Do not include markdown fences or extra commentary.

0 commit comments

Comments
 (0)