Skip to content

Conversation

@ChuxiJ
Copy link
Contributor

@ChuxiJ ChuxiJ commented Feb 11, 2026

Summary by CodeRabbit

  • New Features

    • Added latent shift and rescale controls for fine-grained DiT latent manipulation during generation.
    • Reorganized Advanced Settings into dedicated sections: DiT Diffusion, Generation Settings, LM, Audio Output & Post‑processing, and Automation/Batch.
  • Improvements

    • Training UI messages updated with emoji-prefixed statuses for clearer feedback.
    • Expanded localization keys across languages to expose new settings and training/LoRA UI text.
    • Audio playback now marks samples non-interactive while generation is in progress.

chuxij added 3 commits February 11, 2026 14:00
- Enhanced the caching mechanism for VAE audio encoding, optimizing the reuse of encoded latents.
- Updated logging for better tracking of caching efficiency and latency reuse.

Files changed:
  - acestep/handler.py: refined audio encoding logic to boost caching performance
- Improved the caching mechanism for VAE audio encoding, increasing efficiency in the reuse of encoded latents.
- Enhanced logging to provide clearer insights into caching performance and latency reuse.

Files changed:
  - acestep/handler.py: optimized audio encoding logic for better caching performance
…e generation UI

- Add latent_shift and latent_rescale parameters to event handlers and batch generation
- Reorganize optional parameters section with sub-headings (Music Properties, Generation Settings)
- Add advanced section labels (DiT Diffusion, LM Generation, Audio Output, Automation & Batch)
- Add MLX DiT i18n labels for Apple Silicon support
- Update i18n files (en, zh, ja, he) with new UI labels
- Move latent shift/rescale controls within generation interface layout
@ChuxiJ ChuxiJ requested a review from Copilot February 11, 2026 14:26
@coderabbitai
Copy link

coderabbitai bot commented Feb 11, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds two latent post-processing parameters (latent_shift, latent_rescale) threaded from the Gradio UI through event handlers and batch flows into the generation pipeline, applied to DiT latents before VAE decode; also introduces UI reorganizations, i18n keys, and emoji-driven training UI text updates.

Changes

Cohort / File(s) Summary
Core Generation Logic
acestep/handler.py, acestep/inference.py
Introduce latent_shift and latent_rescale on GenerationParams and generate_music; apply transformation pred_latents = pred_latents * latent_rescale + latent_shift before VAE decode and propagate new params through generation calls.
UI Event Wiring
acestep/gradio_ui/events/__init__.py, acestep/gradio_ui/events/results_handlers.py
Thread latent_shift and latent_rescale through generation parameter state, capture/restore flows, batch queue/storage, and signatures for generation progress & batch-management handlers.
Generation UI Interface
acestep/gradio_ui/interfaces/generation.py
Reorganize Advanced Settings into modular sections; add UI controls and state wiring for latent_shift and latent_rescale with service pre-initialization defaults and include them in returned params.
Training UI Text Updates
acestep/gradio_ui/events/training_handlers.py
Replace many user-facing strings with emoji-prefixed variants (errors, info, success, warnings); no control-flow changes.
Localization Strings
acestep/gradio_ui/i18n/en.json, acestep/gradio_ui/i18n/he.json, acestep/gradio_ui/i18n/ja.json, acestep/gradio_ui/i18n/zh.json
Add keys for latent_shift, latent_rescale, new advanced/optional UI section labels, and expanded training/LoRA UI strings across translations.

Sequence Diagram(s)

sequenceDiagram
participant User
participant UI as Gradio UI
participant Events as Event Handlers
participant Gen as Generation Service
participant VAE

User->>UI: set latent_shift, latent_rescale + other params
UI->>Events: submit generation request (params)
Events->>Gen: call generate_music(params including latent_shift/rescale)
Gen->>Gen: generate DiT latents
Gen->>Gen: apply latent transform (latents * latent_rescale + latent_shift)
Gen->>VAE: decode transformed latents
VAE-->>Gen: images/audio
Gen-->>Events: results + captured params
Events-->>UI: update results/history
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • ChuxiJ

Poem

🐰 A tiny tweak, a shift and scale I bring,
I hop through UI, events, and generation string.
Latents bend and stretch right before decode,
History remembers each parameter mode.
Cheers—new knobs to tune the sonic spring! 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title '2026 02 11 760a' appears to be a date-based identifier rather than a descriptive summary of the changes made in this pull request. Use a descriptive title that summarizes the main change, such as 'Add latent shift and rescale parameters for DiT generation' or 'Implement latent post-processing and training UI enhancements'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 90.48% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 2026-02-11-760a

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds latent post-processing controls (shift/rescale) to help reduce clipping before VAE decode, and reorganizes Gradio advanced settings while improving UI/training status text and i18n coverage.

Changes:

  • Introduce latent_shift / latent_rescale parameters end-to-end (params → handler → UI → batch restore).
  • Apply latent shift/rescale to pred_latents before VAE decode with optional debug logging.
  • Re-organize Gradio advanced settings sections and expand i18n strings; fix garbled status icons in training UI.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
acestep/inference.py Adds latent_shift/latent_rescale to GenerationParams and forwards them into DiT generation.
acestep/handler.py Extends handler generate_music API and applies latent post-processing prior to VAE decode.
acestep/gradio_ui/interfaces/generation.py Reorganizes advanced UI sections; adds latent shift/rescale controls and returns them in UI state.
acestep/gradio_ui/i18n/en.json Adds many new UI strings (incl. training section + latent controls).
acestep/gradio_ui/i18n/zh.json Adds section headers, training strings, and latent control strings; fixes JSON comma.
acestep/gradio_ui/i18n/ja.json Adds section headers and latent control strings; fixes missing comma.
acestep/gradio_ui/i18n/he.json Adds section headers for reorganized UI.
acestep/gradio_ui/events/training_handlers.py Replaces garbled “�” characters with intended status icons in messages.
acestep/gradio_ui/events/results_handlers.py Threads latent shift/rescale through generation, param capture/restore, and batch background generation defaults.
acestep/gradio_ui/events/init.py Wires new UI controls into generation wrapper inputs and param capture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3669 to +3672
logger.debug(f"[generate_music] Latent BEFORE shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
pred_latents = pred_latents * latent_rescale + latent_shift
if self.debug_stats:
logger.debug(f"[generate_music] Latent AFTER shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting PyTorch scalar tensors with :.4f will raise TypeError (Tensor doesn’t support that format spec). Convert reduction results to Python numbers (e.g., .item()) before formatting.

Suggested change
logger.debug(f"[generate_music] Latent BEFORE shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
pred_latents = pred_latents * latent_rescale + latent_shift
if self.debug_stats:
logger.debug(f"[generate_music] Latent AFTER shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
logger.debug(
f"[generate_music] Latent BEFORE shift/rescale: "
f"min={pred_latents.min().item():.4f}, "
f"max={pred_latents.max().item():.4f}, "
f"mean={pred_latents.mean().item():.4f}, "
f"std={pred_latents.std().item():.4f}"
)
pred_latents = pred_latents * latent_rescale + latent_shift
if self.debug_stats:
logger.debug(
f"[generate_music] Latent AFTER shift/rescale: "
f"min={pred_latents.min().item():.4f}, "
f"max={pred_latents.max().item():.4f}, "
f"mean={pred_latents.mean().item():.4f}, "
f"std={pred_latents.std().item():.4f}"
)

Copilot uses AI. Check for mistakes.
Comment on lines +3669 to +3672
logger.debug(f"[generate_music] Latent BEFORE shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
pred_latents = pred_latents * latent_rescale + latent_shift
if self.debug_stats:
logger.debug(f"[generate_music] Latent AFTER shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even under debug_stats, calling min()/max()/mean()/std() separately triggers multiple full-tensor reductions (and on CUDA, likely multiple synchronizations). Compute these stats once per log line (e.g., via aminmax + mean + std) and reuse the results.

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
acestep/gradio_ui/i18n/zh.json (1)

165-166: ⚠️ Potential issue | 🟡 Minor

Japanese text in Chinese translation file.

Lines 165-166 contain Japanese text (CoT メタデータ, LMを使用してCoTメタデータを生成...) instead of Chinese translations. This appears to be a copy-paste error from the Japanese translation file.

🌐 Suggested Chinese translation
-    "cot_metas_label": "CoT メタデータ",
-    "cot_metas_info": "LMを使用してCoTメタデータを生成(チェックを外すとLM CoT生成をスキップ)",
+    "cot_metas_label": "CoT 元数据",
+    "cot_metas_info": "使用LM生成CoT元数据(取消勾选以跳过LM CoT生成)",
🤖 Fix all issues with AI agents
In `@acestep/handler.py`:
- Around line 3665-3673: After applying user-controlled latent post-processing
in generate_music (the pred_latents = pred_latents * latent_rescale +
latent_shift block), validate latent_shift and latent_rescale for finiteness and
reasonable magnitude, and immediately re-check pred_latents for non-finite
values; if any NaN/Inf are found, log an error via logger (include debug_stats
details if enabled) and either clamp/replace those values (e.g., with zeros or
prior safe statistics) or raise/return an explicit error to prevent passing
corrupted latents to the VAE decode—update the latent post-processing section
around pred_latents and the debug logging to perform these checks and handle
failures safely.
🧹 Nitpick comments (3)
acestep/gradio_ui/events/training_handlers.py (3)

327-327: Consider using tuple unpacking instead of concatenation.

The tuple concatenation with + works but can be simplified using iterable unpacking for better readability.

♻️ Suggested refactor using tuple unpacking
 if not dataset_path or not dataset_path.strip():
     updates = (gr.update(), gr.update(), gr.update(), gr.update(), gr.update())
-    return ("❌ Please enter a dataset path", [], _safe_slider(0, value=0, visible=False), builder_state) + empty_preview + updates
+    return ("❌ Please enter a dataset path", [], _safe_slider(0, value=0, visible=False), builder_state, *empty_preview, *updates)
 if not os.path.exists(dataset_path):
     updates = (gr.update(), gr.update(), gr.update(), gr.update(), gr.update())
-    return (f"❌ Dataset not found: {dataset_path}", [], _safe_slider(0, value=0, visible=False), builder_state) + empty_preview + updates
+    return (f"❌ Dataset not found: {dataset_path}", [], _safe_slider(0, value=0, visible=False), builder_state, *empty_preview, *updates)

Also applies to: 334-334


768-768: NaN check idiom is correct but could be more explicit.

The loss == loss check leverages the IEEE 754 property that NaN ≠ NaN, which is clever but may confuse readers. Consider using math.isnan() for clarity.

♻️ Suggested refactor for explicit NaN check

Add import at the top of the file:

import math

Then update the check:

-            if step > 0 and loss is not None and loss == loss:  # Check for NaN
+            if step > 0 and loss is not None and not math.isnan(loss):

800-800: Use explicit conversion flag for f-string.

Using {e!s} is more idiomatic than {str(e)} in f-strings.

♻️ Suggested refactor
-        yield f"❌ Error: {str(e)}", str(e), _training_loss_figure({}, [], []), training_state
+        yield f"❌ Error: {e!s}", f"{e!s}", _training_loss_figure({}, [], []), training_state
-        return f"❌ Export failed: {str(e)}"
+        return f"❌ Export failed: {e!s}"

Also applies to: 862-862

Comment on lines +3665 to +3673
# Apply latent shift and rescale before VAE decode (for anti-clipping control)
if latent_shift != 0.0 or latent_rescale != 1.0:
logger.info(f"[generate_music] Applying latent post-processing: shift={latent_shift}, rescale={latent_rescale}")
if self.debug_stats:
logger.debug(f"[generate_music] Latent BEFORE shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
pred_latents = pred_latents * latent_rescale + latent_shift
if self.debug_stats:
logger.debug(f"[generate_music] Latent AFTER shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Re-check for non-finite latents after shift/rescale.
The NaN/Inf guard runs before applying user-controlled post-processing. A non-finite or extreme latent_shift/latent_rescale can introduce NaNs/Inf and crash or corrupt VAE decode. Add validation and a post-transform finite check.

🔧 Suggested fix
             # Apply latent shift and rescale before VAE decode (for anti-clipping control)
-            if latent_shift != 0.0 or latent_rescale != 1.0:
+            if latent_shift is None:
+                latent_shift = 0.0
+            if latent_rescale is None:
+                latent_rescale = 1.0
+            if latent_shift != 0.0 or latent_rescale != 1.0:
+                if not math.isfinite(latent_shift) or not math.isfinite(latent_rescale):
+                    raise ValueError("latent_shift/latent_rescale must be finite numbers")
                 logger.info(f"[generate_music] Applying latent post-processing: shift={latent_shift}, rescale={latent_rescale}")
                 if self.debug_stats:
                     logger.debug(f"[generate_music] Latent BEFORE shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
                 pred_latents = pred_latents * latent_rescale + latent_shift
                 if self.debug_stats:
                     logger.debug(f"[generate_music] Latent AFTER shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
+                if torch.isnan(pred_latents).any() or torch.isinf(pred_latents).any():
+                    raise RuntimeError("Latent post-processing produced NaN/Inf values")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Apply latent shift and rescale before VAE decode (for anti-clipping control)
if latent_shift != 0.0 or latent_rescale != 1.0:
logger.info(f"[generate_music] Applying latent post-processing: shift={latent_shift}, rescale={latent_rescale}")
if self.debug_stats:
logger.debug(f"[generate_music] Latent BEFORE shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
pred_latents = pred_latents * latent_rescale + latent_shift
if self.debug_stats:
logger.debug(f"[generate_music] Latent AFTER shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
# Apply latent shift and rescale before VAE decode (for anti-clipping control)
if latent_shift is None:
latent_shift = 0.0
if latent_rescale is None:
latent_rescale = 1.0
if latent_shift != 0.0 or latent_rescale != 1.0:
if not math.isfinite(latent_shift) or not math.isfinite(latent_rescale):
raise ValueError("latent_shift/latent_rescale must be finite numbers")
logger.info(f"[generate_music] Applying latent post-processing: shift={latent_shift}, rescale={latent_rescale}")
if self.debug_stats:
logger.debug(f"[generate_music] Latent BEFORE shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
pred_latents = pred_latents * latent_rescale + latent_shift
if self.debug_stats:
logger.debug(f"[generate_music] Latent AFTER shift/rescale: min={pred_latents.min():.4f}, max={pred_latents.max():.4f}, mean={pred_latents.mean():.4f}, std={pred_latents.std():.4f}")
if torch.isnan(pred_latents).any() or torch.isinf(pred_latents).any():
raise RuntimeError("Latent post-processing produced NaN/Inf values")
🤖 Prompt for AI Agents
In `@acestep/handler.py` around lines 3665 - 3673, After applying user-controlled
latent post-processing in generate_music (the pred_latents = pred_latents *
latent_rescale + latent_shift block), validate latent_shift and latent_rescale
for finiteness and reasonable magnitude, and immediately re-check pred_latents
for non-finite values; if any NaN/Inf are found, log an error via logger
(include debug_stats details if enabled) and either clamp/replace those values
(e.g., with zeros or prior safe statistics) or raise/return an explicit error to
prevent passing corrupted latents to the VAE decode—update the latent
post-processing section around pred_latents and the debug logging to perform
these checks and handle failures safely.

@ChuxiJ ChuxiJ changed the title 2026 02 11 760a refact: Reorganized Advanced Settings UI & Added latent shift and rescale Feb 11, 2026
@ChuxiJ ChuxiJ merged commit 1e25597 into main Feb 11, 2026
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant