feat(minimax tts): expose all voice/audio parameters on the tts subcommand by hhdhh · Pull Request #360 · HKUDS/CLI-Anything

hhdhh · 2026-06-17T16:00:44Z

feat(minimax tts): expose all voice/audio parameters on the tts subcommand

Summary

The MiniMax TTS backend (cli_anything/minimax/utils/minimax_backend.py)
already supported a full set of voice and audio parameters, but the CLI
exposed only --text, --model, --voice, and --output. Every other
field was hardcoded, so users had to fork the harness to change speech
speed, volume, pitch, sample rate, bitrate, audio format, or channel
layout.

This change promotes all seven hardcoded parameters to first-class CLI
options on the tts subcommand, with click-level range and choice
validation, and adds a regression test module.

What changed

`cli_anything/minimax/utils/minimax_backend.py`

tts_synthesize(...) gains 7 new parameters: speed, vol, pitch,
sample_rate, bitrate, audio_format, channel.
The hardcoded voice_setting and audio_setting blocks now read from
the new parameters. Defaults match the previous hardcoded values, so
the change is fully backward compatible at the API level.

`cli_anything/minimax/minimax_cli.py`

The tts Click command gains 7 new options:
- --speed (FloatRange 0.5..2.0, default 1.0)
- --vol (FloatRange 0.0..10.0, default 1.0)
- --pitch (IntRange -12..12, default 0)
- --sample-rate (Choice: 8000/16000/22050/24000/32000/44100, default 32000)
- --bitrate (Choice: 32000/64000/128000/256000, default 128000)
- --format (Choice: mp3/pcm/flac, default mp3)
- --channel (Choice: 1/2, default 1)
The tts_synthesize(...) call is updated to forward the new options.

`cli_anything/minimax/tests/test_tts_extended.py` (new)

4 mock-based tests:
- test_tts_default_voice_audio_settings — guards backward-compatible defaults
- test_tts_custom_voice_setting — speed / vol / pitch propagation
- test_tts_custom_audio_setting — sample_rate / bitrate / format / channel
- test_tts_voice_id_propagates — regression guard for voice + speed combo

Verification

# Apply patches (from repo root)
patch -p0 < pr-minimax-tts/01-backend.patch
patch -p0 < pr-minimax-tts/02-cli.patch
cp pr-minimax-tts/03-tests-test_tts_extended.py \
   cli_anything/minimax/tests/test_tts_extended.py

# Run the new tests
cd minimax/agent-harness
PYTHONPATH=. python3 -m pytest cli_anything/minimax/tests/test_tts_extended.py -v
# 4 passed

# Inspect the new surface
PYTHONPATH=. python3 -m cli_anything.minimax.minimax_cli tts --help

Backward compatibility

API call signature gains keyword-only-ish params with default values
identical to the previous hardcoded values, so any existing caller of
tts_synthesize(api_key, text, model, voice, output_path) keeps
working unchanged.
CLI behavior is unchanged when none of the new options are passed.

* feat: CLI-Matrix with multi-approach stages, skill discovery, and matrix search Introduce CLI-Matrix — curated multi-CLI workflow matrices that agents can install in one command. The video-creation matrix bundles 11 CLIs across 8 production stages (AI video gen, capture, audio, voice/TTS, music, NLE editing, captions, thumbnails). Each stage now exposes a goal, alternative approaches (Python libs, cloud APIs, native commands), and skill_search_hints that encourage agents to dynamically discover relevant skills via `npx skills search` rather than relying on hard-coded tool lists. Key changes: - matrix_registry.json: extended stage schema with goal, alternatives, skill_search_hints fields - cli-hub matrix list/search/info/install commands - matrix_skill.py: renders dynamic SKILL.md with stage tooling overview, install status, and aggregated discovery commands - Fixed brittle parents[2] repo root detection with git-based lookup - 85 tests passing (10 new) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * update cli-matrix * feat(cli-matrix): eco-first capability-based matrix, v2 schema + S2-S5 SKILLs - Add docs/cli-matrix/matrix_registry.schema.md describing v2 capability-based registry shape (capabilities[], providers with kind/requires/cost/quality/ offline, recipes[], known_gaps[], decision rubric, suggest-to-user template). - Rewrite cli-hub-matrix/video-creation/SKILL.md and matrix_registry.json (S1) around capabilities + providers + recipes instead of linear stages. - Rename Vn -> Sn across cli-matrix-plan.md and test fixtures. - Reorder scenarios by current completeness; rewrite S2 knowledge-research, S3 3d-cad, S4 game-development, S5 image-design in v2 capability form with full SKILL.md files. - Add docs/cli-matrix/test-plans/video-creation.md with 13 long realistic end-to-end tasks as checkable todo lists, each exercising 5-9 capabilities. - Move cli-matrix-plan.md and matrix_registry.schema.md under docs/cli-matrix/. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Document preview protocol and Audacity autosave Add the preview bundle protocol plan, record video matrix review evidence, and make one-shot Audacity project mutations persist to disk with E2E coverage. * Update CLI matrix skill registry and video workflow * chore(git): always ignore docs/* — working documents stay local Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(cli-matrix): video-creation skill WIP — sound design, source triage, render doctor, NLE refs, video_doctor script - SKILL.md: adds sound.design capability, bundled video_doctor.py provider, recipe updates, and links to five new reference modules - new references: art-direction-review, nle-shotcut-kdenlive, render-doctor, sound-design, source-triage; captions and story-structure-audio updated - scripts/video_doctor.py: bundled probe/diagnose helper Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(cli-hub): preflight detects packages by import name or PyPI dist name (P1-3) _package_available() now tries find_spec as-is, dash->underscore normalized, then importlib.metadata dist lookup (PEP 503), so registry entries like edge-tts are detected when installed. All lookup failures degrade to unavailable instead of crashing preflight. Adds 6 tests. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(cli-hub): distribute matrix skill content to installed skills, wheels, and Pages (P1-4) - matrix install renders to ~/.cli-hub/matrix/<name>/SKILL.md and copies references/ and scripts/ beside it (pycache excluded, idempotent reinstall purges stale files); legacy flat <name>.SKILL.md still read - content lookup chain: repo checkout -> bundled cli_hub/_matrix_data (vendored into sdist/wheel by setup.py build hooks + MANIFEST.in) -> published Pages URL -> stub - new 'matrix install --skill-only' renders skill + assets without installing CLIs - deploy-pages.yml: copy cli-hub-matrix/ into the site after the Jekyll build (served verbatim at /matrix/<name>/); triggers remain main-only - 12 new tests in tests/test_matrix_skill_dist.py (142 total pass) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(cli-matrix): register S2-S5 matrices and resync video-creation registry (P1-1, P1-2) - add knowledge-research (S2, 12 caps), 3d-cad (S3, 12), game-development (S4, 10), image-design (S5, 9) derived from their SKILL.md drafts; full v2 shape with capabilities, providers, recipes, known_gaps; clis lists cross-checked against registry.json (unresolvable tools represented as public-cli/native/python/api/agent-skill providers instead) - video-creation: add sound.design capability (5 providers, wired into 5 recipes), register scripts/video_doctor.py as bundled-script provider under quality.review, cite the 5 new reference modules in provider notes, refresh description - python provider package strings use import names (cv2, edge_tts, ffmpeg, skimage, ...) so preflight detection is robust to dist-name variants like opencv-python-headless - fix homepage URLs to docs/cli-matrix/cli-matrix-plan.md; bump meta.updated to 2026-06-11 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(cli-hub): ship the unified Gallery design as the production homepage Replace docs/hub/index.html with the finalized "Gallery / R2 Flip" main page (Steel Sky palette default, Newsreader serif hero title, liquid-glass flip cards, JS masonry catalog, and the unified Matrices layer with bidirectional stitching). Production-indexable robots meta retained. Stop tracking docs/cli-matrix/* — the CLI-Matrix working docs stay local and confidential; add an explicit /docs/cli-matrix/ ignore rule. * feat: CLI-Matrix command family + Hub docs/demos pages and responsive nav --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…command ## Summary The MiniMax TTS backend (`cli_anything/minimax/utils/minimax_backend.py`) already supported a full set of voice and audio parameters, but the CLI exposed only `--text`, `--model`, `--voice`, and `--output`. Every other field was hardcoded, so users had to fork the harness to change speech speed, volume, pitch, sample rate, bitrate, audio format, or channel layout. This change promotes all seven hardcoded parameters to first-class CLI options on the `tts` subcommand, with click-level range and choice validation, and adds a regression test module. ## What changed ### `cli_anything/minimax/utils/minimax_backend.py` - `tts_synthesize(...)` gains 7 new parameters: `speed`, `vol`, `pitch`, `sample_rate`, `bitrate`, `audio_format`, `channel`. - The hardcoded `voice_setting` and `audio_setting` blocks now read from the new parameters. Defaults match the previous hardcoded values, so the change is fully backward compatible at the API level. ### `cli_anything/minimax/minimax_cli.py` - The `tts` Click command gains 7 new options: - `--speed` (FloatRange 0.5..2.0, default 1.0) - `--vol` (FloatRange 0.0..10.0, default 1.0) - `--pitch` (IntRange -12..12, default 0) - `--sample-rate` (Choice: 8000/16000/22050/24000/32000/44100, default 32000) - `--bitrate` (Choice: 32000/64000/128000/256000, default 128000) - `--format` (Choice: mp3/pcm/flac, default mp3) - `--channel` (Choice: 1/2, default 1) - The `tts_synthesize(...)` call is updated to forward the new options. ### `cli_anything/minimax/tests/test_tts_extended.py` (new) - 4 mock-based tests: - `test_tts_default_voice_audio_settings` — guards backward-compatible defaults - `test_tts_custom_voice_setting` — speed / vol / pitch propagation - `test_tts_custom_audio_setting` — sample_rate / bitrate / format / channel - `test_tts_voice_id_propagates` — regression guard for voice + speed combo ## Verification ```bash # Apply patches (from repo root) patch -p0 < pr-minimax-tts/01-backend.patch patch -p0 < pr-minimax-tts/02-cli.patch cp pr-minimax-tts/03-tests-test_tts_extended.py \ cli_anything/minimax/tests/test_tts_extended.py # Run the new tests cd minimax/agent-harness PYTHONPATH=. python3 -m pytest cli_anything/minimax/tests/test_tts_extended.py -v # 4 passed # Inspect the new surface PYTHONPATH=. python3 -m cli_anything.minimax.minimax_cli tts --help ``` ## Backward compatibility - API call signature gains keyword-only-ish params with default values identical to the previous hardcoded values, so any existing caller of `tts_synthesize(api_key, text, model, voice, output_path)` keeps working unchanged. - CLI behavior is unchanged when none of the new options are passed. ## Related - Skill surfaces this command under `cli-anything-minimax tts` — `skills/cli-anything-minimax/SKILL.md` should mention the new flags in a follow-up doc pass.

chatgpt-codex-connector · 2026-06-17T16:00:49Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

Companion to HKUDS#360. Surfaces the new --speed/--vol/--pitch/--sample-rate/ --bitrate/--format/--channel flags in both: - minimax/agent-harness/cli_anything/minimax/README.md (TTS options table) - skills/cli-anything-minimax/SKILL.md (TTS options block) No code change; just user-facing doc sync.

hhdhh · 2026-06-17T16:20:25Z

Pushed a follow-up commit (9f053c7) on the same branch that documents the new tts flags in minimax/agent-harness/cli_anything/minimax/README.md (TTS options table) and skills/cli-anything-minimax/SKILL.md (TTS options block). No code change, just doc sync. Tests still 4/4 in test_tts_extended.py.

hhdhh · 2026-06-17T16:38:47Z

Demo: new `tts` options in action

Below is the help output and a sample invocation exercising every new flag
from this PR. All 4 new tests in test_tts_extended.py pass.

`cli-anything-minimax tts --help`

Usage: python -m cli_anything.minimax.minimax_cli tts [OPTIONS]

  Synthesize text to speech using MiniMax TTS.

Options:
  -t, --text TEXT                 Text to synthesize  [required]
  --model TEXT                    TTS model ID  [default: speech-2.8-hd]
  --voice TEXT                    Voice ID. Available: English_Graceful_Lady,
                                  English_Insightful_Speaker,
                                  English_radiant_girl,
                                  English_Persuasive_Man, English_Lucky_Robot,
                                  English_expressive_narrator  [default:
                                  English_Graceful_Lady]
  -o, --output PATH               Output audio file path  [default:
                                  output.mp3]
  --speed FLOAT RANGE             Speech speed (0.5-2.0).  [default: 1.0;
                                  0.5<=x<=2.0]
  --vol FLOAT RANGE               Volume (0.0-10.0).  [default: 1.0;
                                  0.0<=x<=10.0]
  --pitch INTEGER RANGE           Pitch shift in semitones (-12..12).
                                  [default: 0; -12<=x<=12]
  --sample-rate [8000|16000|22050|24000|32000|44100]
                                  Audio sample rate.  [default: 32000]
  --bitrate [32000|64000|128000|256000]
                                  Audio bitrate.  [default: 128000]
  --format [mp3|pcm|flac]         Output audio format.  [default: mp3]
  --channel [1|2]                 1=mono, 2=stereo.  [default: 1]
  --help                          Show this message and exit.

Invocation with all 7 new flags

$ python -m cli_anything.minimax.minimax_cli tts \
    --text "hello from cli-anything-minimax with extended tts params" \
    --voice English_radiant_girl \
    --speed 1.2 --vol 2.0 --pitch 3 \
    --sample-rate 44100 --bitrate 256000 \
    --format flac --channel 2 \
    --output /tmp/demo.flac

✓ Audio saved to /tmp/demo.flac (17 bytes)
output_file: /tmp/demo.flac
size_bytes: 17
model: speech-2.8-hd
voice: English_radiant_girl

Validation behavior (clicks-level)

--speed 5 and --vol -1 are rejected before any HTTP call:

$ ... tts --text x --speed 5
Usage: python -m cli_anything.minimax.minimax_cli tts [OPTIONS]
Try 'python -m cli_anything.minimax.minimax_cli tts --help' for help.
Error: Invalid value for '--speed': 5 is not in the range 0.5<=x<=2.0.

$ ... tts --text x --format ogg
Error: Invalid value for '--format': 'ogg' is not one of 'mp3', 'pcm', 'flac'.

Doc sync

The doc commit 9f053c7 mirrors these flags into:

minimax/agent-harness/README.md (new tts parameter table)
skills/cli-anything-minimax/SKILL.md (extended options block)

so the skill surface stays in lockstep with the CLI.

yuh-yang and others added 2 commits June 14, 2026 17:27

hhdhh requested review from omerarslan0, yuh-yang and zhangxilong-43 as code owners June 17, 2026 16:00

github-actions Bot added the existing-cli-fix Fixes or improves an existing CLI harness label Jun 17, 2026

github-actions Bot added the cli-anything-skill Changes CLI-Anything plugin or skill files label Jun 17, 2026

yuh-yang force-pushed the main branch from 58608d3 to dc73924 Compare June 25, 2026 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(minimax tts): expose all voice/audio parameters on the tts subcommand#360

feat(minimax tts): expose all voice/audio parameters on the tts subcommand#360
hhdhh wants to merge 3 commits into
HKUDS:mainfrom
hhdhh:feat/minimax-tts-extended

hhdhh commented Jun 17, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 17, 2026

Uh oh!

hhdhh commented Jun 17, 2026

Uh oh!

hhdhh commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hhdhh commented Jun 17, 2026

feat(minimax tts): expose all voice/audio parameters on the tts subcommand

Summary

What changed

cli_anything/minimax/utils/minimax_backend.py

cli_anything/minimax/minimax_cli.py

cli_anything/minimax/tests/test_tts_extended.py (new)

Verification

Backward compatibility

Related

Uh oh!

chatgpt-codex-connector Bot commented Jun 17, 2026

Uh oh!

hhdhh commented Jun 17, 2026

Uh oh!

hhdhh commented Jun 17, 2026

Demo: new tts options in action

cli-anything-minimax tts --help

Invocation with all 7 new flags

Validation behavior (clicks-level)

Doc sync

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`cli_anything/minimax/utils/minimax_backend.py`

`cli_anything/minimax/minimax_cli.py`

`cli_anything/minimax/tests/test_tts_extended.py` (new)

Demo: new `tts` options in action

`cli-anything-minimax tts --help`