Skip to content

feat: Multimodal Studio — unified image/video/character generation layer#12

Draft
frankxai wants to merge 5 commits into
mainfrom
claude/vibrant-gates-8pp1t
Draft

feat: Multimodal Studio — unified image/video/character generation layer#12
frankxai wants to merge 5 commits into
mainfrom
claude/vibrant-gates-8pp1t

Conversation

@frankxai
Copy link
Copy Markdown
Owner

Multimodal Studio for ACOS

Adds ACOS's unified generation layer: image + video + consistent characters across 30+ frontier models through a single connector (Higgsfield MCP). This closes the multimodal gap vs. agent-first platforms (Google Antigravity 2.0 / Gemini Omni) while staying model-agnostic and brand-locked.

Why

ACOS already had agents, skills, commands, hooks, and six-platform adapters — but visual generation was image-only, hardcoded to one tool/path, with no video and no character consistency. This makes generation a first-class, portable part of the agent loop.

What's in it

Layer File(s)
Skill (the brain) skills/multimodal-studio/SKILL.md + resources/model-matrix.md
Agent .claude/agents/multimodal-director.md
Commands .claude/commands/studio.md, generate-video.md
Connector .mcp.jsonhiggsfield (hosted HTTP, OAuth)
Activation .claude/skill-rules.json, skills/registry.json
Docs CONNECTORS.md, CLAUDE.md, docs/multimodal-studio.md (incl. ACOS-vs-Antigravity positioning)

Differentiators vs. single-vendor platforms

  • Model-agnostic — Soul/Flux/Seedream (image), Kling/Hailuo/Veo/Sora/DoP (video); swap any MCP at the ~~category seam.
  • Character consistency across a whole setcreate_character once, reference its ID everywhere.
  • Brand-locked & auditable — every asset inherits Frank DNA + brand tokens; model+prompt+job-id logged per keeper.
  • Runs inside the agent host you already use (Claude Code, Cursor, Windsurf, OpenCode, Gemini CLI).

Setup

claude mcp add --transport http --scope user higgsfield https://mcp.higgsfield.ai/mcp

Validation

  • All JSON validated (.mcp.json, skill-rules.json, registry.json, plugin.json).
  • Skill/agent/command frontmatter matches existing ACOS conventions.
  • Skill auto-activation wired in skill-rules.json + registry; higgsfield referenced across all surfaces.

Notes / open questions

  • Default connector is Higgsfield's hosted server (OAuth, no keys). Self-hosted stdio variant documented in model-matrix.md.
  • Roadmap (in docs): wire assets into /factory+/publish, persistent character registry in ACOS memory, per-project cost ledger, storyboard mode.

Draft — open for review before merge.

🤖 Generated with Claude Code

https://claude.ai/code/session_01HWwgQVAGWETuQ9t5jGR3j8


Generated by Claude Code

Add ACOS Multimodal Studio — one connector (Higgsfield MCP) delivering
image, video, and consistent-character generation across 30+ models
(Soul, Flux, Seedream, Kling, Hailuo, Veo, Sora). Closes the multimodal
gap vs agent-first platforms while staying model-agnostic and brand-locked.

- skills/multimodal-studio: SKILL.md + resources/model-matrix.md
  (model routing, visual prompt structure, character consistency,
   async lifecycle, AI-slop checklist, brand-lock)
- .claude/agents/multimodal-director.md: creative-director persona
- .claude/commands/studio.md + generate-video.md: operator entry points
- .mcp.json: register higgsfield (hosted HTTP, OAuth)
- skill-rules.json + skills/registry.json: auto-activation + registration
- CONNECTORS.md: Higgsfield as unified multimodal default (vendor-agnostic)
- CLAUDE.md: Multimodal Studio section + new commands
- plugin.json: multimodal/image/video/higgsfield keywords
- docs/multimodal-studio.md: full guide + ACOS-vs-Antigravity positioning

https://claude.ai/code/session_01HWwgQVAGWETuQ9t5jGR3j8
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 17faddee-fe74-43b0-8608-e055f32392b7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/vibrant-gates-8pp1t

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces 'Multimodal Studio (v11)', a unified generation layer for Agentic Creator OS (ACOS) that integrates image, video, and consistent character generation across 30+ frontier models via the Higgsfield MCP connector. It adds a new creative director agent, commands (/studio and /generate-video), a dedicated skill, and extensive documentation. The review feedback is highly constructive, pointing out a missing higgsfield definition in the master mcpServers registry, a conflict regarding image cropping capabilities, the need to explicitly document the parameter name for character IDs in tool calls, and potential issues with environment variable expansion in .mcp.json configurations.

Comment thread skills/registry.json
"downloads": 0,
"rating": 0,
"dependencies": ["frankx-brand"],
"mcp": ["higgsfield"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The MCP server higgsfield is declared as a dependency for the multimodal-studio skill here, but it is not defined in the master mcpServers registry at the bottom of this file (lines 390-421). To maintain registry completeness and ensure proper dependency resolution, please add a definition for higgsfield under mcpServers.


### 5. ASSEMBLE — deliver a coherent set
- Verify the set reads as one campaign (same character ID, palette, lighting language).
- Produce required derivatives (e.g. crop hero → OG 1200×630, square 1080×1080).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The instruction to 'Produce required derivatives (e.g. crop hero...)' conflicts with the roadmap in docs/multimodal-studio.md (line 113), which lists 'Auto-derivatives' as a pending feature. Since the agent currently lacks image-editing/cropping tools and is read-only on code, it cannot perform physical crops. Please clarify if the agent should instead generate separate assets for each aspect ratio, or defer this instruction until the auto-derivatives tool is implemented.


For any recurring subject (a brand mascot, a course instructor avatar, a series protagonist):
1. `create_character` once from a reference image or description → get a character ID.
2. Reference that ID in every subsequent `generate_image` / `generate_video` call.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Please clarify how the character ID should be passed to generate_image and generate_video. If the Higgsfield MCP tools expect a specific parameter (e.g., character_id), explicitly document this parameter name so that the agent can correctly populate the tool call instead of just embedding the ID in the text prompt.

"command": "python",
"args": ["-m", "higgsfield_mcp.server"],
"cwd": "/absolute/path/to/higgsfield_ai_mcp",
"env": { "HF_API_KEY": "${HF_API_KEY}", "HF_SECRET": "${HF_SECRET}" }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Standard MCP configuration files (.mcp.json) do not consistently support environment variable expansion (like ${HF_API_KEY}) across all clients (e.g., Claude Code). It is safer to use standard placeholders like <YOUR_HF_API_KEY> or note that these can be omitted if inherited from the shell environment.

Suggested change
"env": { "HF_API_KEY": "${HF_API_KEY}", "HF_SECRET": "${HF_SECRET}" }
"env": { "HF_API_KEY": "<YOUR_HF_API_KEY>", "HF_SECRET": "<YOUR_HF_SECRET>" }

frankxai added 4 commits May 29, 2026 02:03
tailwind-merge and lucide-react were emitted as unquoted object keys in
the generated package.json template, which is invalid TypeScript (the
hyphen parses as subtraction) and broke `npm run build:all` (TS1005).
Pre-existing build break, unrelated to the multimodal-studio feature.

https://claude.ai/code/session_01HWwgQVAGWETuQ9t5jGR3j8
CI runs `npm ci`, which requires a committed lockfile, but .gitignore
excluded package-lock.json — a self-contradiction that broke the build
job at the install step. Commit the lockfile (un-ignore it) for
reproducible, integrity-checked installs. Pre-existing CI break,
unrelated to the multimodal-studio feature.

https://claude.ai/code/session_01HWwgQVAGWETuQ9t5jGR3j8
The CI build job never actually ran the compile step before (npm ci failed
on the missing lockfile), so several workspaces had latent type errors.
With npm ci fixed, surface and resolve them so `npm run build:all` is green:

- pin TypeScript to 5.6.3 via root overrides: TS 5.7+ raises TS2589
  ("excessively deep") on the MCP SDK + zod registerTool generics
  (browser-mcp). 5.6.3 is the newest 5.x that compiles the SDK pattern.
- evaluator-mcp: add the missing tsconfig.json (only workspace without one;
  bare tsc printed help and exited 1).
- evaluator-mcp: use the zod-parsed input (typed) instead of the raw
  untyped `arguments` object in the evaluate_content / evaluate_hook
  handlers; assert parsed input against evaluateContent's param type.
- evaluator-mcp: fix syllable-counter operator precedence so the per-word
  `|| 1` fallback applies to the word, not the running sum.
- creator-mcp: assert zod-validated params against the (stricter) target
  function signatures at four registerTool call sites.

Behavior-preserving — type alignment + one missing config, no logic changes.
Verified locally: npm ci → build:all → lint all exit 0 across 7 workspaces.

https://claude.ai/code/session_01HWwgQVAGWETuQ9t5jGR3j8
Regenerate committed build output so it matches the type-fixed source
(creator/evaluator) and the pinned compiler. No source or behavior change.

https://claude.ai/code/session_01HWwgQVAGWETuQ9t5jGR3j8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant