-
Notifications
You must be signed in to change notification settings - Fork 585
support cover (exp) #479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
support cover (exp) #479
Conversation
📝 WalkthroughWalkthroughThis PR introduces cover noise strength parameter support throughout the generation pipeline, implements automatic model code synchronization between source and checkpoint directories, adds pure base model detection logic to distinguish model variants, preserves generation mode across initialization, and contributes comprehensive ACEStep v15 model implementations with guidance mechanisms. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as UI Handler
participant Config as Config Path Handler
participant ModelInit as Init Service
participant GenHandlers as Generation Handlers
UI->>Config: config_path changed (with generation_mode)
Config->>GenHandlers: update_model_type_settings(config_path, current_mode)
GenHandlers->>GenHandlers: _is_pure_base_model(config_path)
GenHandlers->>GenHandlers: get_generation_mode_choices(is_pure_base)
GenHandlers-->>Config: mode choices updated (preserving current_mode if valid)
UI->>UI: user clicks init button
UI->>ModelInit: init_service_wrapper(..., current_mode)
ModelInit->>GenHandlers: get_model_type_ui_settings(..., current_mode, is_pure_base)
GenHandlers-->>ModelInit: UI settings with preserved mode
ModelInit-->>UI: generation_mode (preserved across init)
sequenceDiagram
participant Download as Model Downloader
participant Source as Models Source Dir
participant Checkpoint as Checkpoint Dir
participant Handler as Service Handler
Download->>Download: download_main_model()
Download->>Download: _sync_model_code_files(variant_name, checkpoint_dir)
Download->>Checkpoint: copy Python files from source variant
Checkpoint-->>Download: files synced & logged
Handler->>Handler: init_service_wrapper()
Handler->>Download: _check_code_mismatch(config_path, checkpoint)
alt Mismatch Detected
Handler->>Download: _sync_model_code_files()
Download->>Source: locate authoritative source
Download->>Checkpoint: sync/overwrite Python files
end
Checkpoint-->>Handler: code synchronized
sequenceDiagram
participant UI as UI/Handler
participant ResultsHandler as Results Handler
participant GenerationParams as Generation Params
participant Inference as Inference Engine
UI->>ResultsHandler: generate_with_progress(..., cover_noise_strength)
ResultsHandler->>ResultsHandler: capture_current_params(cover_noise_strength)
ResultsHandler->>GenerationParams: create GenerationParams(cover_noise_strength=value)
GenerationParams-->>ResultsHandler: params ready
ResultsHandler->>Inference: call with params.cover_noise_strength
Inference->>Inference: DiT forward(..., cover_noise_strength)
Inference-->>ResultsHandler: generated audio with noise characteristics applied
ResultsHandler->>UI: batch/state updated with cover_noise_strength preserved
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Rationale: PR introduces substantial new model code (three 2000+ line modeling files with complex transformer architectures), new model code synchronization subsystem with file hashing and mismatch detection, consistent but wide parameter threading across handlers/UI/inference, new helper functions for base model detection, multiple configuration classes, and guidance mechanism utilities. While many edits follow repetitive patterns (same parameter added across functions), the modeling modules contain dense logic with multiple interdependent components requiring careful review. Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
…ly processed as lists, preventing data loss in multipart requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
acestep/constants.py (1)
1-4: 🛠️ Refactor suggestion | 🟠 MajorModule exceeds 200 LOC; split or add a follow‑up plan.
This file is over the hard cap; please split by responsibility or document a concrete follow‑up split plan in PR notes.
As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan".
acestep/inference.py (1)
1-12: 🛠️ Refactor suggestion | 🟠 MajorFile exceeds 200 LOC hard cap — split or document a concrete follow‑up plan.
This module is well over the hard cap; please split by responsibility or add PR notes with a concrete split plan.As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan".
acestep/gradio_ui/events/__init__.py (2)
1-7: 🛠️ Refactor suggestion | 🟠 MajorFile exceeds 200 LOC hard cap — split or document a concrete follow‑up plan.
This event wiring module is far over the hard cap; please split by responsibility or add a concrete split plan in PR notes.As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan".
972-1003:⚠️ Potential issue | 🟠 Major
restore_params_btndoesn't restorecover_noise_strength.The slider is saved in batch parameters but won't be restored, breaking the "reuse historical settings" feature. Add it to both the handler extraction and the outputs list.
🧩 Suggested fix
In
results_handlers.pyaround line 2328, add extraction:latent_shift = params.get("latent_shift", 0.0) latent_rescale = params.get("latent_rescale", 1.0) + cover_noise_strength = params.get("cover_noise_strength", 0.0)Update return statement (lines 2345-2353):
return ( codes_main, captions, lyrics, bpm, key_scale, time_signature, vocal_language, audio_duration, batch_size_input, inference_steps, lm_temperature, lm_cfg_scale, lm_top_k, lm_top_p, think_checkbox, use_cot_caption, use_cot_language, allow_lm_batch, track_name, complete_track_classes, - enable_normalization, normalization_db, - latent_shift, latent_rescale + enable_normalization, normalization_db, + latent_shift, latent_rescale, cover_noise_strength )In
__init__.pyadd to outputs list:generation_section["latent_shift"], generation_section["latent_rescale"], + generation_section["cover_noise_strength"], ]acestep/model_downloader.py (1)
1-20: 🛠️ Refactor suggestion | 🟠 MajorFile exceeds 200 LOC hard cap — split or document a concrete follow‑up plan.
This module is well past the hard cap; please split by responsibility or add a concrete split plan in PR notes.As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan".
acestep/handler.py (3)
1966-1981:⚠️ Potential issue | 🟡 Minor
cover_noise_strengthis unused in_prepare_batch.It’s accepted but never used or recorded, and the docstring doesn’t mention it. Either remove it (and update callers) or include it in the batch + docstring so the data flow is explicit. As per coding guidelines, "Docstrings are mandatory for all new or modified Python modules, classes, and functions; must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant."
3372-3417:⚠️ Potential issue | 🟡 MinorDocument
cover_noise_strengthingenerate_musicdocstring.This is a modified public function and the new parameter isn’t described. Please add it as a key input. As per coding guidelines, "Docstrings are mandatory for all new or modified Python modules, classes, and functions; must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant."
2728-2835:⚠️ Potential issue | 🟠 MajorAdd
cover_noise_strengthparameter to MLX diffusion path for API compatibility.The PyTorch path receives
cover_noise_strengthviagenerate_kwargs, but the MLX fast-path ignores it entirely. To maintain consistent API signatures between backends, add the parameter to_mlx_run_diffusion()and thread it through to the call site.🔧 Suggested update
def _mlx_run_diffusion( self, encoder_hidden_states, encoder_attention_mask, context_latents, src_latents, seed, infer_method: str = "ode", shift: float = 3.0, timesteps=None, audio_cover_strength: float = 1.0, + cover_noise_strength: float = 0.0, encoder_hidden_states_non_cover=None, encoder_attention_mask_non_cover=None, context_latents_non_cover=None,And at the call site in
handler.py:outputs = self._mlx_run_diffusion( encoder_hidden_states=encoder_hidden_states, encoder_attention_mask=encoder_attention_mask, context_latents=context_latents, src_latents=src_latents, seed=seed_param, infer_method=infer_method, shift=shift, timesteps=ts_arg, audio_cover_strength=audio_cover_strength, + cover_noise_strength=cover_noise_strength, encoder_hidden_states_non_cover=enc_hs_nc, encoder_attention_mask_non_cover=enc_am_nc, context_latents_non_cover=ctx_nc, )
🤖 Fix all issues with AI agents
In `@acestep/models/sft/modeling_acestep_v15_base.py`:
- Around line 1642-1645: The code mistakenly treats self.tokenize as an object
with a quantizer attribute, causing AttributeError; change references so the
quantizer is accessed from self.tokenizer (e.g., replace
self.tokenize.quantizer.get_output_from_indices(...) with
self.tokenizer.quantizer.get_output_from_indices(...)) and ensure the tokenize
method call remains as self.tokenize(hidden_states, silence_latent,
attention_mask) to obtain lm_hints_5Hz, indices, llm_mask; verify symbols
touched: self.tokenize, self.tokenizer, get_output_from_indices, lm_hints_5Hz,
indices, llm_mask.
In `@acestep/models/turbo/modeling_acestep_v15_turbo.py`:
- Around line 1639-1641: The bug is that code treats self.tokenize as an object
with a .quantizer attribute but self.tokenize is a method; change the call to
obtain the tokenizer object before accessing .quantizer. Replace the problematic
access in modeling_acestep_v15_turbo (the block that checks audio_codes) to call
the tokenizer (e.g., use self.tokenize() or the correct tokenizer property) and
then call .quantizer.get_output_from_indices(audio_codes) — for example use
self.tokenize().quantizer.get_output_from_indices(audio_codes) or, if a
quantizer lives on the instance, use
self.quantizer.get_output_from_indices(audio_codes) so the AttributeError is
avoided.
🟠 Major comments (26)
acestep/models/__init__.py-1-3 (1)
1-3: 🛠️ Refactor suggestion | 🟠 MajorAdd a module docstring (required by guidelines).
These header comments should be a proper module docstring describing purpose plus key behaviors (authoritative source and auto-sync).
As per coding guidelines: “Docstrings are mandatory for all new or modified Python modules, classes, and functions; must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant.”acestep/models/sft/apg_guidance.py-1-220 (1)
1-220:⚠️ Potential issue | 🟠 MajorCode duplication: this file is identical to
acestep/models/base/apg_guidance.py.Both files are 220 LOC and contain identical implementations. This creates maintenance burden—any bug fix or enhancement must be applied in two places.
Consider one of these approaches:
- Import and re-export: If SFT needs the same API, simply re-export from base:
from acestep.models.base.apg_guidance import ( MomentumBuffer, project, apg_forward, cfg_forward, call_cos_tensor, compute_perpendicular_component, adg_forward, adg_w_norm_forward, adg_wo_clip_forward, )- Inheritance/composition: If SFT variants will diverge, extend the base rather than copy.
Additionally, at 220 LOC this file exceeds the 200 LOC hard cap. Per coding guidelines: "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan."
acestep/models/base/apg_guidance.py-1-220 (1)
1-220:⚠️ Potential issue | 🟠 MajorModule exceeds the 200 LOC hard cap — split by responsibility or add a follow‑up plan.
This file is ~220 LOC; please split (e.g., projection/momentum helpers vs ADG wrappers) or add a concrete follow-up split plan in PR notes.
As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan".
acestep/models/base/apg_guidance.py-1-3 (1)
1-3:⚠️ Potential issue | 🟠 MajorAdd a module-level docstring.
All new or modified Python modules must include a concise docstring with purpose, key inputs/outputs, and relevant exceptions per the coding guidelines.
📝 Proposed module docstring
+"""APG/ADG guidance utilities for ACE-Step diffusion models. + +Inputs: conditional/unconditional prediction tensors and guidance parameters. +Outputs: guided prediction tensors. +Raises: ValueError/TypeError for invalid shapes or sigma types. +""" import torch import torch.nn.functional as Facestep/models/base/apg_guidance.py-16-30 (1)
16-30:⚠️ Potential issue | 🟠 MajorFix device preservation and mutable default in
project; also update duplicate in sft module.Using
v0.device.typeand returning via.to(device_type)loses the device index (e.g.,cuda:1becomescuda), causing tensors to move to wrong GPUs. Also,dims=[-1]is a mutable default argument, which is a Python anti-pattern. The function also lacks a docstring and return type hint as required by coding guidelines.The same bug exists identically in
acestep/models/sft/apg_guidance.py(lines 16-30).🐛 Proposed fix
def project( v0: torch.Tensor, # [B, C, T] v1: torch.Tensor, # [B, C, T] - dims=[-1], -): - dtype = v0.dtype - device_type = v0.device.type - if device_type == "mps": + dims=None, +) -> tuple[torch.Tensor, torch.Tensor]: + """Project v0 into components parallel and orthogonal to v1.""" + if dims is None: + dims = (-1,) + dtype = v0.dtype + device = v0.device + if device.type == "mps": v0, v1 = v0.cpu(), v1.cpu() v0, v1 = v0.double(), v1.double() v1 = torch.nn.functional.normalize(v1, dim=dims) v0_parallel = (v0 * v1).sum(dim=dims, keepdim=True) * v1 v0_orthogonal = v0 - v0_parallel - return v0_parallel.to(dtype).to(device_type), v0_orthogonal.to(dtype).to(device_type) + return ( + v0_parallel.to(device=device, dtype=dtype), + v0_orthogonal.to(device=device, dtype=dtype), + )acestep/models/base/apg_guidance.py-5-13 (1)
5-13:⚠️ Potential issue | 🟠 MajorAdd class and method docstrings and return type hints for
MomentumBuffer.Docstrings and return type hints are required for new/modified Python code per the coding guidelines.
📝 Proposed update
class MomentumBuffer: + """Momentum accumulator for APG updates. + + Args: + momentum: Exponential decay factor for the running average. + """ - def __init__(self, momentum: float = -0.75): + def __init__(self, momentum: float = -0.75) -> None: self.momentum = momentum self.running_average = 0 - def update(self, update_value: torch.Tensor): + def update(self, update_value: torch.Tensor) -> None: + """Update the running average in-place. + + Args: + update_value: Latest update tensor. + """ new_average = self.momentum * self.running_average self.running_average = update_value + new_averageacestep/models/base/apg_guidance.py-33-56 (1)
33-56:⚠️ Potential issue | 🟠 MajorAdd docstring, return type hint, and fix mutable default argument in
apg_forward.This function lacks mandatory docstring and return type annotations required by coding guidelines. The
dims=[-1]mutable default argument should beNonewith runtime initialization.📝 Proposed update
def apg_forward( pred_cond: torch.Tensor, # [B, C, T] pred_uncond: torch.Tensor, # [B, C, T] guidance_scale: float, momentum_buffer: MomentumBuffer = None, eta: float = 0.0, norm_threshold: float = 2.5, - dims=[-1], -): + dims=None, +) -> torch.Tensor: + """APG guidance forward pass. + + Args: + pred_cond: Conditional prediction tensor. + pred_uncond: Unconditional prediction tensor. + guidance_scale: Guidance strength. + momentum_buffer: Optional momentum accumulator. + eta: Parallel component mix factor. + norm_threshold: Norm clipping threshold. + dims: Dimensions to reduce over for norms/projections. + Returns: + Guided prediction tensor. + """ + if dims is None: + dims = (-1,)acestep/models/base/apg_guidance.py-107-176 (1)
107-176:⚠️ Potential issue | 🟠 MajorFix numerical stability issue when
sin(theta)is small and mark unused variable.The division by
sin(theta)occurs before the mask is applied (line 171-172), which can produce NaN/inf values that propagate through; use a safe denominator withtorch.whereto avoid this. Also,projis assigned but never used (line 168)—rename to_proj.Proposed fix
- proj, perp = compute_perpendicular_component(latent_diff, latent_hat_uncond) + _proj, perp = compute_perpendicular_component(latent_diff, latent_hat_uncond) latent_v_new = torch.cos(latent_theta_new) * latent_hat_text - latent_p_new = perp * torch.sin(latent_theta_new) / torch.sin(latent_theta) * ( - torch.sin(latent_theta) > 1e-3) + perp * weight * (torch.sin(latent_theta) <= 1e-3) + sin_theta = torch.sin(latent_theta) + sin_theta_new = torch.sin(latent_theta_new) + safe_denom = torch.where(sin_theta.abs() > 1e-3, sin_theta, torch.ones_like(sin_theta)) + latent_p_new = perp * sin_theta_new / safe_denom + latent_p_new = torch.where(sin_theta.abs() > 1e-3, latent_p_new, perp * weight)acestep/models/base/modeling_acestep_v15_base.py-1083-1091 (1)
1083-1091:⚠️ Potential issue | 🟠 MajorCLS/special token is never prepended, yet its slot is used for pooling.
The special-token concat is commented out, but you later return
hidden_states[:, 0, :], which now corresponds to the first input frame. Either prepend the CLS token (and pad the attention mask) or switch to a different pooling strategy.🛠️ Proposed fix (prepend CLS and pad mask)
@@ - # Prepend special token for timbre aggregation (CLS-like token) - # inputs_embeds = torch.cat([self.special_token.expand(inputs_embeds.shape[0], 1, -1), inputs_embeds], dim=1) + # Prepend special token for timbre aggregation (CLS-like token) + inputs_embeds = torch.cat( + [self.special_token.expand(inputs_embeds.shape[0], 1, -1), inputs_embeds], dim=1 + ) + if attention_mask is not None: + attention_mask = F.pad(attention_mask, (1, 0), value=1)Also applies to: 1173-1177
acestep/models/base/modeling_acestep_v15_base.py-1-54 (1)
1-54: 🛠️ Refactor suggestion | 🟠 MajorSplit this module; it exceeds the 200‑LOC hard cap.
This file is ~2k LOC and bundles utilities, encoders, diffusion, generation, and test harness logic. Please split by responsibility (or document a concrete split plan in the PR notes) before merge.
As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan".
acestep/models/base/modeling_acestep_v15_base.py-289-372 (1)
289-372:⚠️ Potential issue | 🟠 MajorAdd missing docstrings (and type hints where practical) for public callables.
The module lacks a top-level docstring, and several new/modified functions (e.g.,
AceStepAttention.forward,AceStepEncoderLayer.forward,AceStepDiTLayer.forward,AceStepDiTModel.forward,generate_audio,test_forward) have no docstrings.sample_t_ralso lacks type hints. Please add concise docstrings covering purpose, key inputs/outputs, and exceptions, plus type hints where practical.As per coding guidelines, "Docstrings are mandatory for all new or modified Python modules, classes, and functions; must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant" and "Add type hints for new/modified functions when practical in Python".
acestep/models/base/modeling_acestep_v15_base.py-1636-1646 (1)
1636-1646:⚠️ Potential issue | 🟠 Major
self.tokenize.quantizerwill crash; replace withself.tokenizer.quantizerand drop
self.tokenizeis a bound method, so the attribute access will raise whenaudio_codesis provided. Also avoidAs per coding guidelines, "Keep logging actionable in Python; avoid noisy logs and `print` debugging in committed code".🛠️ Proposed fix
@@ - if precomputed_lm_hints_25Hz is not None: - print("Using precomputed LM hints") + if precomputed_lm_hints_25Hz is not None: + logger.debug("Using precomputed LM hints") lm_hints_25Hz = precomputed_lm_hints_25Hz[:, :src_latents.shape[1], :] else: if audio_codes is not None: - lm_hints_5Hz = self.tokenize.quantizer.get_output_from_indices(audio_codes) + lm_hints_5Hz = self.tokenizer.quantizer.get_output_from_indices(audio_codes) else: lm_hints_5Hz, indices, llm_mask = self.tokenize(hidden_states, silence_latent, attention_mask)acestep/models/base/modeling_acestep_v15_base.py-1381-1428 (1)
1381-1428:⚠️ Potential issue | 🟠 MajorProvided padding masks are discarded; encoder mask uses the wrong source.
attention_maskandencoder_attention_maskare overwritten withNone, so padding is never applied. In the non‑Flash path the encoder mask is also built fromattention_maskinstead ofencoder_attention_mask, letting decoder tokens attend to padded encoder positions.🛠️ Proposed fix (preserve caller masks)
@@ - full_attn_mask = None - sliding_attn_mask = None - encoder_attention_mask = None - attention_mask = None + full_attn_mask = None + sliding_attn_mask = None + encoder_attn_mask = None @@ - full_attn_mask = attention_mask + full_attn_mask = attention_mask @@ - sliding_attn_mask = attention_mask if self.config.use_sliding_window else None + sliding_attn_mask = attention_mask if self.config.use_sliding_window else None + encoder_attn_mask = encoder_attention_mask @@ - full_attn_mask = create_4d_mask( + full_attn_mask = create_4d_mask( seq_len=seq_len, dtype=dtype, device=device, attention_mask=attention_mask, # [B, L] @@ - encoder_attention_mask = create_4d_mask( - seq_len=max_len, - dtype=dtype, - device=device, - attention_mask=attention_mask, # [B, L] - sliding_window=None, - is_sliding_window=False, - is_causal=False # <--- 关键:双向注意力 - ) - encoder_attention_mask = encoder_attention_mask[:, :, :seq_len, :encoder_seq_len] + if encoder_attention_mask is not None: + max_len = max(seq_len, encoder_seq_len) + enc_mask = encoder_attention_mask + if enc_mask.shape[1] != max_len: + enc_mask = F.pad(enc_mask, (0, max_len - enc_mask.shape[1]), value=0) + encoder_attn_mask = create_4d_mask( + seq_len=max_len, + dtype=dtype, + device=device, + attention_mask=enc_mask, + sliding_window=None, + is_sliding_window=False, + is_causal=False + ) + encoder_attn_mask = encoder_attn_mask[:, :, :seq_len, :encoder_seq_len] @@ self_attn_mask_mapping = { "full_attention": full_attn_mask, "sliding_attention": sliding_attn_mask, - "encoder_attention_mask": encoder_attention_mask, + "encoder_attention_mask": encoder_attn_mask, }acestep/models/base/configuration_acestep_v15.py-1-15 (1)
1-15: 🛠️ Refactor suggestion | 🟠 MajorModule exceeds 200 LOC; split or add a follow‑up plan.
This configuration module is well beyond the hard cap; please split by responsibility or document a concrete follow‑up split plan in PR notes.
As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan".
acestep/models/turbo/modeling_acestep_v15_turbo.py-1-20 (1)
1-20: 🛠️ Refactor suggestion | 🟠 MajorModule exceeds 200 LOC; split by responsibility or add a follow‑up plan.
This file is far beyond the hard cap; please split into focused modules or document a concrete follow‑up split plan in PR notes.
As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan".
acestep/models/sft/configuration_acestep_v15.py-1-15 (1)
1-15: 🛠️ Refactor suggestion | 🟠 MajorModule exceeds 200 LOC; split or add a follow‑up plan.
This configuration module is beyond the hard cap; please split by responsibility or document a concrete follow‑up split plan in PR notes.
As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan".
acestep/models/turbo/configuration_acestep_v15.py-1-15 (1)
1-15: 🛠️ Refactor suggestion | 🟠 MajorModule exceeds 200 LOC; split or add a follow‑up plan.
This configuration module is beyond the hard cap; please split by responsibility or document a concrete follow‑up split plan in PR notes.
As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan".
acestep/models/turbo/modeling_acestep_v15_turbo.py-333-346 (1)
333-346:⚠️ Potential issue | 🟠 MajorGuard cache updates when
position_embeddingsisNone.If
position_embeddingsis omitted and cache is enabled,sin/cosare undefined and will raise at runtime.🐛 Suggested guard
- if past_key_value is not None: - # Sin and cos are specific to RoPE models; cache_position needed for the static cache - cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position} - key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs) + if past_key_value is not None: + if position_embeddings is None: + raise ValueError("position_embeddings must be provided when using cache with RoPE.") + # Sin and cos are specific to RoPE models; cache_position needed for the static cache + cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position} + key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)acestep/models/turbo/modeling_acestep_v15_turbo.py-1705-1716 (1)
1705-1716:⚠️ Potential issue | 🟠 MajorUse the sampled
rfortimestep_r.
sample_t_rreturns bothtandr, buttis passed twice, which defeats the flow‑matching setup.🐛 Suggested fix
- decoder_outputs = self.decoder( + decoder_outputs = self.decoder( hidden_states=xt, timestep=t, - timestep_r=t, + timestep_r=r, attention_mask=attention_mask, encoder_hidden_states=encoder_hidden_states, encoder_attention_mask=encoder_attention_mask, context_latents=context_latents, )acestep/models/turbo/modeling_acestep_v15_turbo.py-510-535 (1)
510-535:⚠️ Potential issue | 🟠 MajorAvoid
UnboundLocalErrorwhen cross‑attention is disabled.
cross_attn_weightsis only defined whenuse_cross_attention=True, but you still append it whenoutput_attentions=True.🐛 Suggested fix
- if self.use_cross_attention: + cross_attn_weights = None + if self.use_cross_attention: norm_hidden_states = self.cross_attn_norm(hidden_states).type_as(hidden_states) attn_output, cross_attn_weights = self.cross_attn( @@ - if output_attentions: - outputs += (self_attn_weights, cross_attn_weights) + if output_attentions: + outputs += (self_attn_weights, cross_attn_weights)acestep/models/turbo/modeling_acestep_v15_turbo.py-1378-1425 (1)
1378-1425:⚠️ Potential issue | 🟠 MajorDon’t drop provided attention masks.
attention_mask/encoder_attention_maskare overwritten toNone, which discards padding info and can leak attention into padded tokens.🐛 Suggested fix
- encoder_attention_mask = None - attention_mask = None + # keep provided masksacestep/models/sft/modeling_acestep_v15_base.py-1-20 (1)
1-20: 🛠️ Refactor suggestion | 🟠 MajorFile exceeds 200 LOC hard cap — split by responsibility or document a concrete follow‑up plan.
This new module is ~2k+ LOC and bundles attention utilities, encoders, tokenization, diffusion, generation, and test harness. Please split into focused modules (or add a PR note with a concrete split plan) before merge.As per coding guidelines, "Target module size: optimal <= 150 LOC, hard cap <= 200 LOC; if exceeded, split by responsibility before merging or justify in PR notes with a concrete follow-up split plan".
acestep/models/sft/modeling_acestep_v15_base.py-1837-1857 (1)
1837-1857:⚠️ Potential issue | 🟠 MajorNon‑cover branch can crash when
non_cover_text_*isn’t provided.
Ifaudio_cover_strength < 1.0andnon_cover_text_hidden_statesisNone(default),text_projectorwill error.🐛 Suggested fix
if audio_cover_strength < 1.0: non_is_covers = torch.zeros_like(is_covers, device=is_covers.device, dtype=is_covers.dtype) + if non_cover_text_hidden_states is None: + non_cover_text_hidden_states = text_hidden_states + non_cover_text_attention_mask = text_attention_mask # Use silence_latent for non-cover condition to simulate text2music mode (no reference audio) silence_latent_expanded = silence_latent[:, :src_latents.shape[1], :].expand(src_latents.shape[0], -1, -1)acestep/models/sft/modeling_acestep_v15_base.py-1708-1718 (1)
1708-1718:⚠️ Potential issue | 🟠 MajorFlow‑matching samples
rbut the decoder always receivest.
ris computed yet ignored; this likely alters the intended objective.🐛 Suggested fix
- t, r = sample_t_r(bsz, device, dtype, self.config.data_proportion, self.config.timestep_mu, self.config.timestep_sigma, use_meanflow=False) + t, r = sample_t_r(bsz, device, dtype, self.config.data_proportion, self.config.timestep_mu, self.config.timestep_sigma, use_meanflow=False) @@ - timestep_r=t, + timestep_r=r,acestep/models/sft/modeling_acestep_v15_base.py-513-538 (1)
513-538:⚠️ Potential issue | 🟠 Major
cross_attn_weightscan be undefined whenuse_cross_attention=False.
Whenoutput_attentions=Trueand cross-attention is disabled, this raisesUnboundLocalError.🐛 Suggested fix
- # Step 2: Cross-attention (if enabled) for conditioning on encoder outputs - if self.use_cross_attention: + cross_attn_weights = None + # Step 2: Cross-attention (if enabled) for conditioning on encoder outputs + if self.use_cross_attention: norm_hidden_states = self.cross_attn_norm(hidden_states).type_as(hidden_states) attn_output, cross_attn_weights = self.cross_attn( hidden_states=norm_hidden_states, encoder_hidden_states=encoder_hidden_states, attention_mask=encoder_attention_mask, past_key_value=past_key_value, output_attentions=output_attentions, use_cache=use_cache, **kwargs, ) # Standard residual connection for cross-attention hidden_states = hidden_states + attn_output @@ - if output_attentions: - outputs += (self_attn_weights, cross_attn_weights) + if output_attentions: + outputs += (self_attn_weights,) if cross_attn_weights is None else (self_attn_weights, cross_attn_weights)acestep/models/sft/modeling_acestep_v15_base.py-1381-1448 (1)
1381-1448:⚠️ Potential issue | 🟠 Major
attention_mask/encoder_attention_maskare overwritten, dropping padding.
The incoming masks are set toNone, so padding is ignored and cross‑attention masks are built fromNone.🐛 Suggested fix
- encoder_attention_mask = None - attention_mask = None + # Preserve incoming attention_mask/encoder_attention_mask for padding-aware masks + encoder_attention_mask_4d = encoder_attention_mask if is_flash_attn: @@ - full_attn_mask = attention_mask + full_attn_mask = attention_mask @@ - sliding_attn_mask = attention_mask if self.config.use_sliding_window else None + sliding_attn_mask = attention_mask if self.config.use_sliding_window else None + encoder_attention_mask_4d = encoder_attention_mask @@ - encoder_attention_mask = create_4d_mask( + encoder_attention_mask_4d = None + if encoder_attention_mask is not None: + encoder_attention_mask_4d = create_4d_mask( seq_len=max_len, dtype=dtype, device=device, - attention_mask=attention_mask, # [B, L] + attention_mask=encoder_attention_mask, # [B, L] sliding_window=None, is_sliding_window=False, is_causal=False # <--- 关键:双向注意力 - ) - encoder_attention_mask = encoder_attention_mask[:, :, :seq_len, :encoder_seq_len] + ) + encoder_attention_mask_4d = encoder_attention_mask_4d[:, :, :seq_len, :encoder_seq_len] @@ - self_attn_mask_mapping = { + self_attn_mask_mapping = { "full_attention": full_attn_mask, "sliding_attention": sliding_attn_mask, - "encoder_attention_mask": encoder_attention_mask, + "encoder_attention_mask": encoder_attention_mask_4d, }
🟡 Minor comments (26)
acestep/models/sft/apg_guidance.py-168-168 (1)
168-168:⚠️ Potential issue | 🟡 MinorUnused variable
proj.The
projvariable fromcompute_perpendicular_componentis unpacked but never used. Use underscore prefix to indicate intentional discard.Proposed fix
- proj, perp = compute_perpendicular_component(latent_diff, latent_hat_uncond) + _proj, perp = compute_perpendicular_component(latent_diff, latent_hat_uncond)acestep/models/sft/apg_guidance.py-74-76 (1)
74-76:⚠️ Potential issue | 🟡 MinorPotential division by zero when normalizing.
If either tensor has zero norm, division by zero will produce
nan/inf. Consider adding epsilon for numerical stability.Proposed fix
- tensor1 = tensor1 / torch.linalg.norm(tensor1, dim=1, keepdim=True) - tensor2 = tensor2 / torch.linalg.norm(tensor2, dim=1, keepdim=True) + eps = 1e-8 + tensor1 = tensor1 / (torch.linalg.norm(tensor1, dim=1, keepdim=True) + eps) + tensor2 = tensor2 / (torch.linalg.norm(tensor2, dim=1, keepdim=True) + eps)acestep/models/base/apg_guidance.py-183-204 (1)
183-204:⚠️ Potential issue | 🟡 MinorAdd return type hint and correct sigma type annotation for
adg_w_norm_forward.The wrapped function
adg_forwardaccepts bothfloatandtorch.Tensorfor sigma (lines 142–153), yet the wrapper's type hint incorrectly declaressigma: float. Add the return type hint-> torch.Tensorand broaden the sigma type totorch.Tensor | floatto match the actual behavior.Suggested update
def adg_w_norm_forward( latents: torch.Tensor, noise_pred_cond: torch.Tensor, noise_pred_uncond: torch.Tensor, - sigma: float, + sigma: torch.Tensor | float, guidance_scale: float, angle_clip: float = 3.14 / 3, -): +) -> torch.Tensor:acestep/models/base/apg_guidance.py-207-220 (1)
207-220:⚠️ Potential issue | 🟡 MinorAdd return type hint and broaden
sigmatype foradg_wo_clip_forwardto matchadg_forward's actual accepted types.The
adg_forwardfunction accepts bothtorch.Tensorand numeric types forsigma(lines 142–153), while this wrapper only declaressigma: float. Add the return type hint and broaden the parameter type for consistency.🧩 Proposed update
def adg_wo_clip_forward( latents: torch.Tensor, noise_pred_cond: torch.Tensor, noise_pred_uncond: torch.Tensor, - sigma: float, + sigma: torch.Tensor | float, guidance_scale: float, -): +) -> torch.Tensor:acestep/models/base/apg_guidance.py-80-104 (1)
80-104:⚠️ Potential issue | 🟡 MinorAdd type hints, document the
ValueErrorin docstring, and usereshapeinstead ofviewfor safer tensor operations.The function is missing type annotations on parameters and return type, and the docstring doesn't document the raised
ValueError. Additionally,.view()can fail on non-contiguous tensors; use.reshape()consistently for all tensor reshaping operations.🧩 Proposed update
-def compute_perpendicular_component(latent_diff, latent_hat_uncond): +def compute_perpendicular_component( + latent_diff: torch.Tensor, + latent_hat_uncond: torch.Tensor, +) -> tuple[torch.Tensor, torch.Tensor]: """ Decompose latent_diff into parallel and perpendicular components relative to latent_hat_uncond. @@ Returns: projection: Parallel component perpendicular_component: Perpendicular component + Raises: + ValueError: If input tensors have mismatched shapes. """ n, t, c = latent_diff.shape - latent_diff = latent_diff.view(n * t, c).float() - latent_hat_uncond = latent_hat_uncond.view(n * t, c).float() + latent_diff = latent_diff.reshape(n * t, c).float() + latent_hat_uncond = latent_hat_uncond.reshape(n * t, c).float()acestep/models/base/apg_guidance.py-63-77 (1)
63-77:⚠️ Potential issue | 🟡 MinorAdd type hints and use
F.cosine_similarityfor numerical stability incall_cos_tensor.The function lacks type hints, which the coding guidelines require for new/modified functions. Additionally, manually dividing by norm creates a divide-by-zero vulnerability that produces NaNs for zero-norm inputs. Using
F.cosine_similarity(..., eps=...)is clearer, handles this edge case safely, and maintains the same output shape and values.🧩 Proposed update
-def call_cos_tensor(tensor1, tensor2): +def call_cos_tensor( + tensor1: torch.Tensor, + tensor2: torch.Tensor, + eps: float = 1e-8, +) -> torch.Tensor: """ Calculate cosine similarity between two normalized tensors. @@ - tensor1 = tensor1 / torch.linalg.norm(tensor1, dim=1, keepdim=True) - tensor2 = tensor2 / torch.linalg.norm(tensor2, dim=1, keepdim=True) - cosvalue = torch.sum(tensor1 * tensor2, dim=1, keepdim=True) - return cosvalue + return F.cosine_similarity(tensor1, tensor2, dim=1, eps=eps).unsqueeze(1)acestep/models/base/modeling_acestep_v15_base.py-2066-2069 (1)
2066-2069:⚠️ Potential issue | 🟡 MinorRuff F841/F811 in test harness (unused vars, duplicate import).
cur_hidden_states/cur_attention_maskare assigned but never used, andosis re-imported in__main__. These will fail Ruff if enforced; remove them or prefix with_, and drop the duplicate import.🛠️ Proposed fix
@@ - cur_hidden_states = torch.randn(2, seq_len, 64, dtype=model_dtype, device=device) - cur_attention_mask = torch.ones(2, seq_len, dtype=model_dtype, device=device) + # cur_hidden_states / cur_attention_mask removed (unused) @@ - import osAlso applies to: 2126-2134
acestep/models/base/modeling_acestep_v15_base.py-1931-1941 (1)
1931-1941:⚠️ Potential issue | 🟡 Minor
use_cacheparameter is ignored in the decoder call.You always pass
use_cache=True, so callers cannot disable caching. Honor the argument or drop it from the API.🛠️ Proposed fix
@@ - use_cache=True, + use_cache=use_cache,acestep/models/base/modeling_acestep_v15_base.py-513-538 (1)
513-538:⚠️ Potential issue | 🟡 MinorGuard
cross_attn_weightswhen cross‑attention is disabled.If
use_cross_attention=Falseandoutput_attentions=True,cross_attn_weightsis undefined and will raise. Initialize it toNoneand/or only append when cross‑attn is enabled.🛠️ Proposed fix
@@ - # Step 2: Cross-attention (if enabled) for conditioning on encoder outputs - if self.use_cross_attention: + cross_attn_weights = None + # Step 2: Cross-attention (if enabled) for conditioning on encoder outputs + if self.use_cross_attention: norm_hidden_states = self.cross_attn_norm(hidden_states).type_as(hidden_states) attn_output, cross_attn_weights = self.cross_attn( hidden_states=norm_hidden_states, @@ outputs = (hidden_states,) if output_attentions: outputs += (self_attn_weights, cross_attn_weights)acestep/models/base/modeling_acestep_v15_base.py-1454-1463 (1)
1454-1463:⚠️ Potential issue | 🟡 Minor
enable_early_exitcurrently has no effect.
max_needed_layeris computed but never used to truncate the loop, so early exit is silently ignored. Implement the break or remove the flag/config.🛠️ Proposed fix
@@ - max_needed_layer = float('inf') - if custom_layers_config is not None and enable_early_exit: - max_needed_layer = max(custom_layers_config.keys()) + max_needed_layer = None + if custom_layers_config is not None and enable_early_exit: + max_needed_layer = max(custom_layers_config.keys()) @@ - for index_block, layer_module in enumerate(self.layers): + for index_block, layer_module in enumerate(self.layers): + if max_needed_layer is not None and index_block > max_needed_layer: + breakacestep/models/turbo/configuration_acestep_v15.py-148-189 (1)
148-189:⚠️ Potential issue | 🟡 MinorAdd a function docstring and type hints for
__init__.
AceStepConfig.__init__is new but lacks a concise function docstring and typing for its parameters/return.As per coding guidelines, "Docstrings are mandatory for all new or modified Python modules, classes, and functions; must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant" and "Add type hints for new/modified functions when practical in Python".
acestep/models/base/configuration_acestep_v15.py-148-189 (1)
148-189:⚠️ Potential issue | 🟡 MinorAdd a function docstring and type hints for
__init__.
AceStepConfig.__init__is new but lacks a concise function docstring and typing for its parameters/return.As per coding guidelines, "Docstrings are mandatory for all new or modified Python modules, classes, and functions; must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant" and "Add type hints for new/modified functions when practical in Python".
acestep/models/sft/configuration_acestep_v15.py-148-189 (1)
148-189:⚠️ Potential issue | 🟡 MinorAdd a function docstring and type hints for
__init__.
AceStepConfig.__init__is new but lacks a concise function docstring and typing for its parameters/return.As per coding guidelines, "Docstrings are mandatory for all new or modified Python modules, classes, and functions; must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant" and "Add type hints for new/modified functions when practical in Python".
acestep/models/turbo/modeling_acestep_v15_turbo.py-615-617 (1)
615-617:⚠️ Potential issue | 🟡 MinorFix the lyric‑encoder assertion message.
The assertion checks
input_ids is Nonebut the message says onlyinput_idsis supported.✏️ Suggested fix
- assert input_ids is None, "Only `input_ids` is supported for the lyric encoder." + assert input_ids is None, "Only `inputs_embeds` is supported for the lyric encoder."acestep/models/turbo/modeling_acestep_v15_turbo.py-1635-1637 (1)
1635-1637:⚠️ Potential issue | 🟡 MinorReplace
Avoid print debugging in committed code; use the module logger instead.
🔧 Suggested fix
- print("Using precomputed LM hints") + logger.debug("Using precomputed LM hints")As per coding guidelines, "Keep logging actionable in Python; avoid noisy logs and
acestep/models/base/configuration_acestep_v15.py-150-216 (1)
150-216:⚠️ Potential issue | 🟡 MinorAvoid a mutable default for
fsq_input_levels.Using a list as a default argument can create shared state across instances.
🛠️ Suggested fix
- fsq_input_levels=[8, 8, 8, 5, 5, 5], + fsq_input_levels=None, @@ - self.fsq_input_levels = fsq_input_levels + if fsq_input_levels is None: + fsq_input_levels = [8, 8, 8, 5, 5, 5] + self.fsq_input_levels = list(fsq_input_levels)As per coding guidelines, "Avoid hidden state and unintended side effects in Python code".
acestep/models/sft/configuration_acestep_v15.py-150-216 (1)
150-216:⚠️ Potential issue | 🟡 MinorAvoid a mutable default for
fsq_input_levels.Using a list as a default argument can create shared state across instances.
🛠️ Suggested fix
- fsq_input_levels=[8, 8, 8, 5, 5, 5], + fsq_input_levels=None, @@ - self.fsq_input_levels = fsq_input_levels + if fsq_input_levels is None: + fsq_input_levels = [8, 8, 8, 5, 5, 5] + self.fsq_input_levels = list(fsq_input_levels)As per coding guidelines, "Avoid hidden state and unintended side effects in Python code".
acestep/models/turbo/configuration_acestep_v15.py-150-216 (1)
150-216:⚠️ Potential issue | 🟡 MinorAvoid a mutable default for
fsq_input_levels.Using a list as a default argument can create shared state across instances.
🛠️ Suggested fix
- fsq_input_levels=[8, 8, 8, 5, 5, 5], + fsq_input_levels=None, @@ - self.fsq_input_levels = fsq_input_levels + if fsq_input_levels is None: + fsq_input_levels = [8, 8, 8, 5, 5, 5] + self.fsq_input_levels = list(fsq_input_levels)As per coding guidelines, "Avoid hidden state and unintended side effects in Python code".
acestep/models/turbo/modeling_acestep_v15_turbo.py-1577-1806 (1)
1577-1806:⚠️ Potential issue | 🟡 MinorAdd docstrings and type hints to public methods.
tokenize,prepare_condition,get_x0_from_noise,renoise, andgenerate_audiolack docstrings. Additionally,tokenize,get_x0_from_noise, andrenoiseomit parameter type hints and return types;generate_audioandprepare_conditionlack return type annotations.Per coding guidelines, docstrings are mandatory for all new or modified functions (including purpose, key inputs/outputs, and exceptions), and type hints should be added for new/modified functions when practical.
acestep/models/sft/modeling_acestep_v15_base.py-1527-1554 (1)
1527-1554:⚠️ Potential issue | 🟡 MinorAdd a docstring for
AceStepConditionEncoder.forward(and other new public methods).
Several new public methods lack docstrings; please add concise purpose/inputs/outputs/exceptions.As per coding guidelines, "Docstrings are mandatory for all new or modified Python modules, classes, and functions; must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant".
acestep/models/sft/modeling_acestep_v15_base.py-1-13 (1)
1-13:⚠️ Potential issue | 🟡 MinorAdd a module docstring for this new file.
A concise module docstring is required for new modules.As per coding guidelines, "Docstrings are mandatory for all new or modified Python modules, classes, and functions; must be concise and include purpose plus key inputs/outputs and raised exceptions when relevant".✍️ Suggested module docstring
+# NOTE: keep the license header above. +"""AceStep v1.5 base modeling components (attention, encoders, diffusion, tokenization)."""acestep/models/sft/modeling_acestep_v15_base.py-172-197 (1)
172-197:⚠️ Potential issue | 🟡 MinorAdd type hints + return type for
sample_t_r.
This function is new/modified and lacks practical type hints.As per coding guidelines, "Add type hints for new/modified functions when practical in Python".✍️ Suggested fix
-def sample_t_r(batch_size, device, dtype, data_proportion=0.0, timestep_mu=-0.4, timestep_sigma=1.0, use_meanflow=True): +def sample_t_r( + batch_size: int, + device: torch.device, + dtype: torch.dtype, + data_proportion: float = 0.0, + timestep_mu: float = -0.4, + timestep_sigma: float = 1.0, + use_meanflow: bool = True, +) -> tuple[torch.Tensor, torch.Tensor]:acestep/models/sft/modeling_acestep_v15_base.py-1454-1460 (1)
1454-1460:⚠️ Potential issue | 🟡 Minor
max_needed_layeris unused; early‑exit never happens (ruff F841).
Either remove it or use it to stop the layer loop.🧹 Suggested fix
- for index_block, layer_module in enumerate(self.layers): + for index_block, layer_module in enumerate(self.layers): + if index_block > max_needed_layer: + breakacestep/model_downloader.py-56-115 (1)
56-115:⚠️ Potential issue | 🟡 MinorAdd type hints for
checkpoints_dirin the new sync helpers.
These functions are new and can be annotated cleanly.As per coding guidelines, "Add type hints for new/modified functions when practical in Python".✍️ Suggested fix
-from typing import Optional, List, Dict, Tuple +from typing import Optional, List, Dict, Tuple, Union @@ -def _check_code_mismatch(model_name: str, checkpoints_dir) -> List[str]: +def _check_code_mismatch(model_name: str, checkpoints_dir: Union[Path, str]) -> List[str]: @@ -def _sync_model_code_files(model_name: str, checkpoints_dir) -> List[str]: +def _sync_model_code_files(model_name: str, checkpoints_dir: Union[Path, str]) -> List[str]:acestep/models/sft/modeling_acestep_v15_base.py-1638-1640 (1)
1638-1640:⚠️ Potential issue | 🟡 MinorReplace
print()in committed code is noisy and hard to control in production.As per coding guidelines, "Keep logging actionable in Python; avoid noisy logs and `print` debugging in committed code".🧼 Suggested fix
- print("Using precomputed LM hints") + logger.info("Using precomputed LM hints")acestep/handler.py-616-625 (1)
616-625:⚠️ Potential issue | 🟡 MinorRemove unnecessary
fprefix in log string.The f-string has no placeholders, so Ruff is right to warn.
🔧 Suggested fix
- logger.info(f"[initialize_service] Model code files synced successfully.") + logger.info("[initialize_service] Model code files synced successfully.")
🧹 Nitpick comments (11)
acestep/models/sft/apg_guidance.py (6)
1-14: Unused import and missing class docstring.
- Line 2:
torch.nn.functional as Fis imported but never used.MomentumBufferclass is missing a docstring describing its purpose and usage.Proposed fix
import torch -import torch.nn.functional as F class MomentumBuffer: + """ + Accumulates momentum-weighted running average of update values for APG guidance. + + Args: + momentum: Momentum coefficient (default: -0.75, negative for dampening). + """ def __init__(self, momentum: float = -0.75):
16-30: Mutable default argument and missing docstring.
dims=[-1]is a mutable default argument. While lists are mutable, this specific usage is read-only and won't cause bugs, but it's a Python anti-pattern that static analyzers flag (B006). Replace withNoneand default inside the function for consistency.The function also lacks a docstring explaining its purpose (projection decomposition).
Proposed fix
def project( v0: torch.Tensor, # [B, C, T] v1: torch.Tensor, # [B, C, T] - dims=[-1], -): + dims: list[int] | None = None, +) -> tuple[torch.Tensor, torch.Tensor]: + """ + Project v0 onto v1, returning parallel and orthogonal components. + + Args: + v0: Source tensor [B, C, T] + v1: Direction tensor [B, C, T] + dims: Dimensions for normalization (default: [-1]) + + Returns: + Tuple of (parallel component, orthogonal component) + """ + if dims is None: + dims = [-1] dtype = v0.dtype
33-56: Missing docstring and mutable default.Same mutable default issue on
dims=[-1]. More importantly, this function implements the APG (Adaptive Projected Gradient) guidance algorithm but lacks documentation explaining the algorithm, parameters, and expected behavior.Proposed fix
def apg_forward( pred_cond: torch.Tensor, # [B, C, T] pred_uncond: torch.Tensor, # [B, C, T] guidance_scale: float, momentum_buffer: MomentumBuffer = None, eta: float = 0.0, norm_threshold: float = 2.5, - dims=[-1], -): + dims: list[int] | None = None, +) -> torch.Tensor: + """ + APG (Adaptive Projected Gradient) guidance forward pass. + + Args: + pred_cond: Conditional prediction [B, C, T] + pred_uncond: Unconditional prediction [B, C, T] + guidance_scale: Guidance strength multiplier + momentum_buffer: Optional momentum buffer for smoothing + eta: Parallel component weight (default: 0.0) + norm_threshold: Maximum norm for diff scaling (default: 2.5) + dims: Dimensions for projection (default: [-1]) + + Returns: + Guided prediction tensor [B, C, T] + """ + if dims is None: + dims = [-1] diff = pred_cond - pred_uncond
59-60: Missing type hints and docstring.This function lacks type annotations and a docstring. While simple, it should document what CFG (Classifier-Free Guidance) does.
Proposed fix
-def cfg_forward(cond_output, uncond_output, cfg_strength): +def cfg_forward( + cond_output: torch.Tensor, + uncond_output: torch.Tensor, + cfg_strength: float, +) -> torch.Tensor: + """Apply Classifier-Free Guidance: lerp from uncond toward cond.""" return uncond_output + cfg_strength * (cond_output - uncond_output)
111-111: Type hint doesn't reflect actual accepted types.The signature declares
sigma: torch.Tensorbut the function body handlesint,float, andtorch.Tensor(lines 142-153). Update the type hint to reflect this.Proposed fix
- sigma: torch.Tensor, + sigma: torch.Tensor | float | int,
183-220: Type hints forsigmainconsistent withadg_forward.Both wrapper functions declare
sigma: floatbut delegate toadg_forwardwhich acceptstorch.Tensor | float | int. For API consistency, consider matching the broader type.acestep/models/base/apg_guidance.py (1)
59-60: Consider adding docstring and type hints tocfg_forwardif modifying this function.Based on coding guidelines, docstrings and type hints are mandatory for new or modified functions. The current code lacks both. If you're touching this function, update it as follows:
Suggested update
-def cfg_forward(cond_output, uncond_output, cfg_strength): +def cfg_forward( + cond_output: torch.Tensor, + uncond_output: torch.Tensor, + cfg_strength: float, +) -> torch.Tensor: + """Classic CFG blend of conditional and unconditional outputs.""" return uncond_output + cfg_strength * (cond_output - uncond_output)acestep/models/turbo/modeling_acestep_v15_turbo.py (2)
169-194: Add type hints forsample_t_r.This helper is new; typing improves readability and tooling support.
🔧 Example typing
-def sample_t_r(batch_size, device, dtype, data_proportion=0.0, timestep_mu=-0.4, timestep_sigma=1.0, use_meanflow=True): +def sample_t_r( + batch_size: int, + device: torch.device, + dtype: torch.dtype, + data_proportion: float = 0.0, + timestep_mu: float = -0.4, + timestep_sigma: float = 1.0, + use_meanflow: bool = True, +) -> tuple[torch.Tensor, torch.Tensor]:As per coding guidelines, "Add type hints for new/modified functions when practical in Python".
2004-2156: Consider moving the test harness to tests/examples.Keeping heavyweight demo code out of the model module improves maintainability and keeps runtime entrypoints clean.
acestep/inference.py (1)
72-81: Documentcover_noise_strengthinGenerationParamsdocstring.
This new parameter is user‑facing; please add it to the attribute list.✍️ Suggested docstring update
# Task-Specific Parameters task_type: Type of generation task. One of: "text2music", "cover", "repaint", "lego", "extract", "complete". reference_audio: Path to a reference audio file for style transfer or cover tasks. src_audio: Path to a source audio file for audio-to-audio tasks. audio_codes: Audio semantic codes as a string (advanced use, for code-control generation). repainting_start: For repaint/lego tasks: start time in seconds for region to repaint. repainting_end: For repaint/lego tasks: end time in seconds for region to repaint (-1 for until end). audio_cover_strength: Strength of reference audio/codes influence (range 0.0–1.0). set smaller (0.2) for style transfer tasks. + cover_noise_strength: Initialize noise from source audio (0 = pure noise/no cover, 1 = closest to source). instruction: Optional task instruction prompt. If empty, auto-generated by system.acestep/gradio_ui/events/generation_handlers.py (1)
723-723: Use explicitOptionaltype for nullable parameter.The
current_modeparameter can beNonebut the type hint only specifiesstr. Per PEP 484, this should use an explicitOptionalor union type to accurately describe the expected types.🔧 Proposed fix
-def get_model_type_ui_settings(is_turbo: bool, current_mode: str = None, is_pure_base: bool = False): +def get_model_type_ui_settings(is_turbo: bool, current_mode: Optional[str] = None, is_pure_base: bool = False):Note:
Optionalis already imported fromtypingat the top of the file.
| if audio_codes is not None: | ||
| lm_hints_5Hz = self.tokenize.quantizer.get_output_from_indices(audio_codes) | ||
| else: | ||
| lm_hints_5Hz, indices, llm_mask = self.tokenize(hidden_states, silence_latent, attention_mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.tokenize is a method — this will raise AttributeError.
The quantizer lives on self.tokenizer, not on the tokenize method.
🐛 Suggested fix
- lm_hints_5Hz = self.tokenize.quantizer.get_output_from_indices(audio_codes)
+ lm_hints_5Hz = self.tokenizer.quantizer.get_output_from_indices(audio_codes)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if audio_codes is not None: | |
| lm_hints_5Hz = self.tokenize.quantizer.get_output_from_indices(audio_codes) | |
| else: | |
| lm_hints_5Hz, indices, llm_mask = self.tokenize(hidden_states, silence_latent, attention_mask) | |
| if audio_codes is not None: | |
| lm_hints_5Hz = self.tokenizer.quantizer.get_output_from_indices(audio_codes) | |
| else: | |
| lm_hints_5Hz, indices, llm_mask = self.tokenize(hidden_states, silence_latent, attention_mask) |
🧰 Tools
🪛 Ruff (0.15.0)
[warning] 1645-1645: Unpacked variable indices is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
[warning] 1645-1645: Unpacked variable llm_mask is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
In `@acestep/models/sft/modeling_acestep_v15_base.py` around lines 1642 - 1645,
The code mistakenly treats self.tokenize as an object with a quantizer
attribute, causing AttributeError; change references so the quantizer is
accessed from self.tokenizer (e.g., replace
self.tokenize.quantizer.get_output_from_indices(...) with
self.tokenizer.quantizer.get_output_from_indices(...)) and ensure the tokenize
method call remains as self.tokenize(hidden_states, silence_latent,
attention_mask) to obtain lm_hints_5Hz, indices, llm_mask; verify symbols
touched: self.tokenize, self.tokenizer, get_output_from_indices, lm_hints_5Hz,
indices, llm_mask.
| if audio_codes is not None: | ||
| lm_hints_5Hz = self.tokenize.quantizer.get_output_from_indices(audio_codes) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
audio_codes path calls self.tokenize.quantizer (AttributeError).
self.tokenize is a method, so .quantizer doesn’t exist. This will crash when audio_codes is provided.
🐛 Suggested fix
- lm_hints_5Hz = self.tokenize.quantizer.get_output_from_indices(audio_codes)
+ lm_hints_5Hz = self.tokenizer.quantizer.get_output_from_indices(audio_codes)🤖 Prompt for AI Agents
In `@acestep/models/turbo/modeling_acestep_v15_turbo.py` around lines 1639 - 1641,
The bug is that code treats self.tokenize as an object with a .quantizer
attribute but self.tokenize is a method; change the call to obtain the tokenizer
object before accessing .quantizer. Replace the problematic access in
modeling_acestep_v15_turbo (the block that checks audio_codes) to call the
tokenizer (e.g., use self.tokenize() or the correct tokenizer property) and then
call .quantizer.get_output_from_indices(audio_codes) — for example use
self.tokenize().quantizer.get_output_from_indices(audio_codes) or, if a
quantizer lives on the instance, use
self.quantizer.get_output_from_indices(audio_codes) so the AttributeError is
avoided.
Summary by CodeRabbit
New Features
Improvements
Localization