Skip to content

feat(minimax tts): expose all voice/audio parameters on the tts subcommand#360

Open
hhdhh wants to merge 3 commits into
HKUDS:mainfrom
hhdhh:feat/minimax-tts-extended
Open

feat(minimax tts): expose all voice/audio parameters on the tts subcommand#360
hhdhh wants to merge 3 commits into
HKUDS:mainfrom
hhdhh:feat/minimax-tts-extended

Conversation

@hhdhh

@hhdhh hhdhh commented Jun 17, 2026

Copy link
Copy Markdown

feat(minimax tts): expose all voice/audio parameters on the tts subcommand

Summary

The MiniMax TTS backend (cli_anything/minimax/utils/minimax_backend.py)
already supported a full set of voice and audio parameters, but the CLI
exposed only --text, --model, --voice, and --output. Every other
field was hardcoded, so users had to fork the harness to change speech
speed, volume, pitch, sample rate, bitrate, audio format, or channel
layout.

This change promotes all seven hardcoded parameters to first-class CLI
options on the tts subcommand, with click-level range and choice
validation, and adds a regression test module.

What changed

cli_anything/minimax/utils/minimax_backend.py

  • tts_synthesize(...) gains 7 new parameters: speed, vol, pitch,
    sample_rate, bitrate, audio_format, channel.
  • The hardcoded voice_setting and audio_setting blocks now read from
    the new parameters. Defaults match the previous hardcoded values, so
    the change is fully backward compatible at the API level.

cli_anything/minimax/minimax_cli.py

  • The tts Click command gains 7 new options:
    • --speed (FloatRange 0.5..2.0, default 1.0)
    • --vol (FloatRange 0.0..10.0, default 1.0)
    • --pitch (IntRange -12..12, default 0)
    • --sample-rate (Choice: 8000/16000/22050/24000/32000/44100, default 32000)
    • --bitrate (Choice: 32000/64000/128000/256000, default 128000)
    • --format (Choice: mp3/pcm/flac, default mp3)
    • --channel (Choice: 1/2, default 1)
  • The tts_synthesize(...) call is updated to forward the new options.

cli_anything/minimax/tests/test_tts_extended.py (new)

  • 4 mock-based tests:
    • test_tts_default_voice_audio_settings — guards backward-compatible defaults
    • test_tts_custom_voice_setting — speed / vol / pitch propagation
    • test_tts_custom_audio_setting — sample_rate / bitrate / format / channel
    • test_tts_voice_id_propagates — regression guard for voice + speed combo

Verification

# Apply patches (from repo root)
patch -p0 < pr-minimax-tts/01-backend.patch
patch -p0 < pr-minimax-tts/02-cli.patch
cp pr-minimax-tts/03-tests-test_tts_extended.py \
   cli_anything/minimax/tests/test_tts_extended.py

# Run the new tests
cd minimax/agent-harness
PYTHONPATH=. python3 -m pytest cli_anything/minimax/tests/test_tts_extended.py -v
# 4 passed

# Inspect the new surface
PYTHONPATH=. python3 -m cli_anything.minimax.minimax_cli tts --help

Backward compatibility

  • API call signature gains keyword-only-ish params with default values
    identical to the previous hardcoded values, so any existing caller of
    tts_synthesize(api_key, text, model, voice, output_path) keeps
    working unchanged.
  • CLI behavior is unchanged when none of the new options are passed.

Related

  • Skill surfaces this command under cli-anything-minimax tts
    skills/cli-anything-minimax/SKILL.md should mention the new flags
    in a follow-up doc pass.

yuh-yang and others added 2 commits June 14, 2026 17:27
* feat: CLI-Matrix with multi-approach stages, skill discovery, and matrix search

Introduce CLI-Matrix — curated multi-CLI workflow matrices that agents can
install in one command. The video-creation matrix bundles 11 CLIs across 8
production stages (AI video gen, capture, audio, voice/TTS, music, NLE
editing, captions, thumbnails).

Each stage now exposes a goal, alternative approaches (Python libs, cloud
APIs, native commands), and skill_search_hints that encourage agents to
dynamically discover relevant skills via `npx skills search` rather than
relying on hard-coded tool lists.

Key changes:
- matrix_registry.json: extended stage schema with goal, alternatives,
  skill_search_hints fields
- cli-hub matrix list/search/info/install commands
- matrix_skill.py: renders dynamic SKILL.md with stage tooling overview,
  install status, and aggregated discovery commands
- Fixed brittle parents[2] repo root detection with git-based lookup
- 85 tests passing (10 new)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* update cli-matrix

* feat(cli-matrix): eco-first capability-based matrix, v2 schema + S2-S5 SKILLs

- Add docs/cli-matrix/matrix_registry.schema.md describing v2 capability-based
  registry shape (capabilities[], providers with kind/requires/cost/quality/
  offline, recipes[], known_gaps[], decision rubric, suggest-to-user template).
- Rewrite cli-hub-matrix/video-creation/SKILL.md and matrix_registry.json (S1)
  around capabilities + providers + recipes instead of linear stages.
- Rename Vn -> Sn across cli-matrix-plan.md and test fixtures.
- Reorder scenarios by current completeness; rewrite S2 knowledge-research,
  S3 3d-cad, S4 game-development, S5 image-design in v2 capability form with
  full SKILL.md files.
- Add docs/cli-matrix/test-plans/video-creation.md with 13 long realistic
  end-to-end tasks as checkable todo lists, each exercising 5-9 capabilities.
- Move cli-matrix-plan.md and matrix_registry.schema.md under docs/cli-matrix/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Document preview protocol and Audacity autosave

Add the preview bundle protocol plan, record video matrix review evidence, and make one-shot Audacity project mutations persist to disk with E2E coverage.

* Update CLI matrix skill registry and video workflow

* chore(git): always ignore docs/* — working documents stay local

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(cli-matrix): video-creation skill WIP — sound design, source triage, render doctor, NLE refs, video_doctor script

- SKILL.md: adds sound.design capability, bundled video_doctor.py provider,
  recipe updates, and links to five new reference modules
- new references: art-direction-review, nle-shotcut-kdenlive, render-doctor,
  sound-design, source-triage; captions and story-structure-audio updated
- scripts/video_doctor.py: bundled probe/diagnose helper

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(cli-hub): preflight detects packages by import name or PyPI dist name (P1-3)

_package_available() now tries find_spec as-is, dash->underscore
normalized, then importlib.metadata dist lookup (PEP 503), so registry
entries like edge-tts are detected when installed. All lookup failures
degrade to unavailable instead of crashing preflight. Adds 6 tests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(cli-hub): distribute matrix skill content to installed skills, wheels, and Pages (P1-4)

- matrix install renders to ~/.cli-hub/matrix/<name>/SKILL.md and copies
  references/ and scripts/ beside it (pycache excluded, idempotent
  reinstall purges stale files); legacy flat <name>.SKILL.md still read
- content lookup chain: repo checkout -> bundled cli_hub/_matrix_data
  (vendored into sdist/wheel by setup.py build hooks + MANIFEST.in) ->
  published Pages URL -> stub
- new 'matrix install --skill-only' renders skill + assets without
  installing CLIs
- deploy-pages.yml: copy cli-hub-matrix/ into the site after the Jekyll
  build (served verbatim at /matrix/<name>/); triggers remain main-only
- 12 new tests in tests/test_matrix_skill_dist.py (142 total pass)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(cli-matrix): register S2-S5 matrices and resync video-creation registry (P1-1, P1-2)

- add knowledge-research (S2, 12 caps), 3d-cad (S3, 12), game-development
  (S4, 10), image-design (S5, 9) derived from their SKILL.md drafts; full
  v2 shape with capabilities, providers, recipes, known_gaps; clis lists
  cross-checked against registry.json (unresolvable tools represented as
  public-cli/native/python/api/agent-skill providers instead)
- video-creation: add sound.design capability (5 providers, wired into 5
  recipes), register scripts/video_doctor.py as bundled-script provider
  under quality.review, cite the 5 new reference modules in provider
  notes, refresh description
- python provider package strings use import names (cv2, edge_tts,
  ffmpeg, skimage, ...) so preflight detection is robust to dist-name
  variants like opencv-python-headless
- fix homepage URLs to docs/cli-matrix/cli-matrix-plan.md; bump
  meta.updated to 2026-06-11

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(cli-hub): ship the unified Gallery design as the production homepage

Replace docs/hub/index.html with the finalized "Gallery / R2 Flip" main page
(Steel Sky palette default, Newsreader serif hero title, liquid-glass flip
cards, JS masonry catalog, and the unified Matrices layer with bidirectional
stitching). Production-indexable robots meta retained.

Stop tracking docs/cli-matrix/* — the CLI-Matrix working docs stay local and
confidential; add an explicit /docs/cli-matrix/ ignore rule.

* feat: CLI-Matrix command family + Hub docs/demos pages and responsive nav

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…command

## Summary

The MiniMax TTS backend (`cli_anything/minimax/utils/minimax_backend.py`)
already supported a full set of voice and audio parameters, but the CLI
exposed only `--text`, `--model`, `--voice`, and `--output`. Every other
field was hardcoded, so users had to fork the harness to change speech
speed, volume, pitch, sample rate, bitrate, audio format, or channel
layout.

This change promotes all seven hardcoded parameters to first-class CLI
options on the `tts` subcommand, with click-level range and choice
validation, and adds a regression test module.

## What changed

### `cli_anything/minimax/utils/minimax_backend.py`
- `tts_synthesize(...)` gains 7 new parameters: `speed`, `vol`, `pitch`,
  `sample_rate`, `bitrate`, `audio_format`, `channel`.
- The hardcoded `voice_setting` and `audio_setting` blocks now read from
  the new parameters. Defaults match the previous hardcoded values, so
  the change is fully backward compatible at the API level.

### `cli_anything/minimax/minimax_cli.py`
- The `tts` Click command gains 7 new options:
  - `--speed` (FloatRange 0.5..2.0, default 1.0)
  - `--vol` (FloatRange 0.0..10.0, default 1.0)
  - `--pitch` (IntRange -12..12, default 0)
  - `--sample-rate` (Choice: 8000/16000/22050/24000/32000/44100, default 32000)
  - `--bitrate` (Choice: 32000/64000/128000/256000, default 128000)
  - `--format` (Choice: mp3/pcm/flac, default mp3)
  - `--channel` (Choice: 1/2, default 1)
- The `tts_synthesize(...)` call is updated to forward the new options.

### `cli_anything/minimax/tests/test_tts_extended.py` (new)
- 4 mock-based tests:
  - `test_tts_default_voice_audio_settings` — guards backward-compatible defaults
  - `test_tts_custom_voice_setting` — speed / vol / pitch propagation
  - `test_tts_custom_audio_setting` — sample_rate / bitrate / format / channel
  - `test_tts_voice_id_propagates` — regression guard for voice + speed combo

## Verification

```bash
# Apply patches (from repo root)
patch -p0 < pr-minimax-tts/01-backend.patch
patch -p0 < pr-minimax-tts/02-cli.patch
cp pr-minimax-tts/03-tests-test_tts_extended.py \
   cli_anything/minimax/tests/test_tts_extended.py

# Run the new tests
cd minimax/agent-harness
PYTHONPATH=. python3 -m pytest cli_anything/minimax/tests/test_tts_extended.py -v
# 4 passed

# Inspect the new surface
PYTHONPATH=. python3 -m cli_anything.minimax.minimax_cli tts --help
```

## Backward compatibility

- API call signature gains keyword-only-ish params with default values
  identical to the previous hardcoded values, so any existing caller of
  `tts_synthesize(api_key, text, model, voice, output_path)` keeps
  working unchanged.
- CLI behavior is unchanged when none of the new options are passed.

## Related

- Skill surfaces this command under `cli-anything-minimax tts` —
  `skills/cli-anything-minimax/SKILL.md` should mention the new flags
  in a follow-up doc pass.
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@github-actions github-actions Bot added the existing-cli-fix Fixes or improves an existing CLI harness label Jun 17, 2026
Companion to HKUDS#360. Surfaces the new --speed/--vol/--pitch/--sample-rate/
--bitrate/--format/--channel flags in both:
- minimax/agent-harness/cli_anything/minimax/README.md (TTS options table)
- skills/cli-anything-minimax/SKILL.md (TTS options block)

No code change; just user-facing doc sync.
@hhdhh

hhdhh commented Jun 17, 2026

Copy link
Copy Markdown
Author

Pushed a follow-up commit (9f053c7) on the same branch that documents the new tts flags in minimax/agent-harness/cli_anything/minimax/README.md (TTS options table) and skills/cli-anything-minimax/SKILL.md (TTS options block). No code change, just doc sync. Tests still 4/4 in test_tts_extended.py.

@github-actions github-actions Bot added the cli-anything-skill Changes CLI-Anything plugin or skill files label Jun 17, 2026
@hhdhh

hhdhh commented Jun 17, 2026

Copy link
Copy Markdown
Author

Demo: new tts options in action

Below is the help output and a sample invocation exercising every new flag
from this PR. All 4 new tests in test_tts_extended.py pass.

cli-anything-minimax tts --help

Usage: python -m cli_anything.minimax.minimax_cli tts [OPTIONS]

  Synthesize text to speech using MiniMax TTS.

Options:
  -t, --text TEXT                 Text to synthesize  [required]
  --model TEXT                    TTS model ID  [default: speech-2.8-hd]
  --voice TEXT                    Voice ID. Available: English_Graceful_Lady,
                                  English_Insightful_Speaker,
                                  English_radiant_girl,
                                  English_Persuasive_Man, English_Lucky_Robot,
                                  English_expressive_narrator  [default:
                                  English_Graceful_Lady]
  -o, --output PATH               Output audio file path  [default:
                                  output.mp3]
  --speed FLOAT RANGE             Speech speed (0.5-2.0).  [default: 1.0;
                                  0.5<=x<=2.0]
  --vol FLOAT RANGE               Volume (0.0-10.0).  [default: 1.0;
                                  0.0<=x<=10.0]
  --pitch INTEGER RANGE           Pitch shift in semitones (-12..12).
                                  [default: 0; -12<=x<=12]
  --sample-rate [8000|16000|22050|24000|32000|44100]
                                  Audio sample rate.  [default: 32000]
  --bitrate [32000|64000|128000|256000]
                                  Audio bitrate.  [default: 128000]
  --format [mp3|pcm|flac]         Output audio format.  [default: mp3]
  --channel [1|2]                 1=mono, 2=stereo.  [default: 1]
  --help                          Show this message and exit.

Invocation with all 7 new flags

$ python -m cli_anything.minimax.minimax_cli tts \
    --text "hello from cli-anything-minimax with extended tts params" \
    --voice English_radiant_girl \
    --speed 1.2 --vol 2.0 --pitch 3 \
    --sample-rate 44100 --bitrate 256000 \
    --format flac --channel 2 \
    --output /tmp/demo.flac

✓ Audio saved to /tmp/demo.flac (17 bytes)
output_file: /tmp/demo.flac
size_bytes: 17
model: speech-2.8-hd
voice: English_radiant_girl

Validation behavior (clicks-level)

--speed 5 and --vol -1 are rejected before any HTTP call:

$ ... tts --text x --speed 5
Usage: python -m cli_anything.minimax.minimax_cli tts [OPTIONS]
Try 'python -m cli_anything.minimax.minimax_cli tts --help' for help.
Error: Invalid value for '--speed': 5 is not in the range 0.5<=x<=2.0.
$ ... tts --text x --format ogg
Error: Invalid value for '--format': 'ogg' is not one of 'mp3', 'pcm', 'flac'.

Doc sync

The doc commit 9f053c7 mirrors these flags into:

  • minimax/agent-harness/README.md (new tts parameter table)
  • skills/cli-anything-minimax/SKILL.md (extended options block)

so the skill surface stays in lockstep with the CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli-anything-skill Changes CLI-Anything plugin or skill files existing-cli-fix Fixes or improves an existing CLI harness

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants