Skip to content

Conversation

@lutzkirschner64-dot
Copy link
Contributor

@lutzkirschner64-dot lutzkirschner64-dot commented Feb 10, 2026

This update addresses critical audio quality issues. Previously, generated songs often suffered from severe clipping and lacked headroom (generated at 0.0 dBFS True Peak). These changes implement professional audio standards to ensure a clean workflow for musicians and engineers.

Key Enhancements:

  • Peak Normalization: Prevents clipping by scaling audio to a safe target level (default -1.0 dB).
  • 32-bit Float Support: Preserves full dynamic range and prevents signal degradation.
  • UI Localization: Full support for English, Chinese, and Japanese labels.
    CHANGES_SUMMARY.md
Screenshot 2026-02-10 010545 Screenshot 2026-02-10 010705

Summary by CodeRabbit

  • New Features

    • Added audio normalization with configurable target dB and UI controls.
    • Added WAV32 export option for higher-fidelity 32-bit WAV saves.
    • Expanded generation UI with normalization and LoRA controls.
  • Bug Fixes

    • More robust audio saving with fallbacks and improved extension/format handling.
    • Audio hashing/ID generation now uses SHA-256 (hash values will change).
  • Localization & UI Updates

    • Added translation entries for normalization controls and refined batch-results wording.

acestep/audio_utils.py (Includes the new `normalize_audio` function and the 32-bit memory fix)
acestep/inference.py (Calls normalization)
acestep/gradio_ui/interfaces/generation.py(Checkbox for normalization & 32-bit selection)
acestep/gradio_ui/events/results_handlers.py (Most important fix for the "FileNotFoundError")
acestep/gradio_ui/i18n/en.json (English text for the new features)
acestep/gradio_ui/i18n/zh.json (Chinese placeholders)
acestep/gradio_ui/i18n/ja.json (Japanese placeholders)
@coderabbitai
Copy link

coderabbitai bot commented Feb 10, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds peak audio normalization and WAV32 save pathways to audio utilities, migrates audio hashing to SHA‑256, exposes normalization controls in the generation UI, and adds corresponding English/Japanese/Chinese translation keys and minor UI string updates.

Changes

Cohort / File(s) Summary
Audio Utilities
acestep/audio_utils.py
Added normalize_audio(audio_data, target_db=-1.0) for peak normalization. Extended AudioSaver/save_audio to support "wav32" with a dedicated save path and soundfile fallbacks, improved extension/format handling, and switched hashing functions (get_audio_file_hash, generate_uuid_from_audio_data) from MD5 to SHA‑256.
Generation Interface
acestep/gradio_ui/interfaces/generation.py
Replaced placeholder with a full create_generation_section(...) implementation that builds the Gradio generation UI, integrates normalization controls (enable_normalization, normalization_db), adjusts device/backend defaults and runtime UI logic, and returns expanded component/state handles. No public signature change.
Internationalization
acestep/gradio_ui/i18n/en.json, acestep/gradio_ui/i18n/ja.json, acestep/gradio_ui/i18n/zh.json
Added top‑level gen keys: enable_normalization, enable_normalization_info, normalization_db, normalization_db_info. Also updated some UI strings (e.g., batch results wording) across locales.

Sequence Diagram(s)

sequenceDiagram
    participant UI as "User / Generation UI"
    participant Gen as "Generation Section\n(create_generation_section)"
    participant Audio as "acestep.audio_utils\n(normalize_audio, AudioSaver)"
    participant FS as "Filesystem / Soundfile\n(encoding / save backend)"

    UI->>Gen: toggle normalization, submit generation
    Gen->>Audio: send audio buffers + normalization params
    Audio->>Audio: normalize_audio(audio, target_db)
    Audio->>Audio: choose save format/path (wav32 or other)
    Audio->>FS: attempt primary save (wav32 path / ffmpeg)
    alt primary save fails
        Audio->>FS: fallback save using soundfile (32-bit WAV or other)
    end
    Audio->>Audio: compute SHA-256 hash / uuid from audio data
    Audio->>Gen: return saved file path + hash/uuid
    Gen->>UI: display results (file link, hash, generation details)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through samples, soft and neat,
I nudged the peaks until they breathed, not bleat,
WAV32 tunnels, fallbacks paved the way,
SHA‑256 seeds mark each small audio bay,
A tiny rabbit cheers the new UI seat!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the two main changes: audio normalization to prevent clipping and support for 32-bit WAV format, both of which are prominent throughout the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@acestep/audio_utils.py`:
- Around line 112-128: The numpy-to-torch conversion in audio_utils.py inverts
the channels_first logic: when channels_first is True the code should treat
audio_data as already [channels, samples] and not transpose; when False it
should convert [samples, channels] to [channels, samples] only if needed. Update
the numpy branch handling of audio_data and audio_tensor so it matches the torch
branch: for numpy, if channels_first is True convert directly with
torch.from_numpy(audio_data).float() (no .T), and if channels_first is False
convert with torch.from_numpy(audio_data).float() and transpose to [channels,
samples] only when audio_tensor.dim() == 2 and audio_tensor.shape[0] <
audio_tensor.shape[1]; keep the torch branch logic (audio_data.cpu().float() and
conditional transpose) unchanged.
- Around line 186-195: The fallback except block captures an unused exception
variable and incorrectly passes "wav32" as a soundfile format; fix by changing
the outer except to use a throwaway name (e.g., except Exception:) or reference
the variable in the log, and update the soundfile fallback to map format "wav32"
to format="WAV" with subtype="PCM_32" (for other formats set format_upper =
format.upper() and subtype=None), then call sf.write(str(output_path), audio_np,
sample_rate, format=format_upper, subtype=subtype) and log the error
appropriately using logger.error(e) when the inner fallback fails; refer to the
symbols audio_tensor, output_path, sample_rate, format, logger and the fallback
block in audio_utils.py.

In `@acestep/gradio_ui/i18n/ja.json`:
- Around line 247-252: The new localization keys under "gen"
(enable_normalization, enable_normalization_info, normalization_db,
normalization_db_info) are still in English; translate those four strings into
Japanese and replace the English values in the ja.json file so the UI shows
Japanese labels and help text for audio normalization; keep the JSON keys
unchanged and only update their string values with accurate Japanese
translations that match the tone and length of existing entries.

In `@acestep/gradio_ui/i18n/zh.json`:
- Around line 247-252: The JSON keys under "gen" ("enable_normalization",
"enable_normalization_info", "normalization_db", "normalization_db_info") are
still in English; replace their English values with Chinese translations so the
zh locale shows localized labels and help text (translate the label "Enable
Normalization", the info text about normalizing audio volume, "Target Peak (dB)"
and the info about dB recommendations). Keep the keys intact and only update the
string values for those four keys.

In `@acestep/gradio_ui/interfaces/generation.py`:
- Line 307: The UI references a missing i18n key generation.advanced_dit_params
inside the gr.Accordion call (look for the
gr.Accordion(t("generation.advanced_dit_params"), open=False) usage) causing the
raw key to display; add the generation.advanced_dit_params entry to all locale
files (en/zh/ja) with appropriate translations or replace the reference with an
existing i18n key so the accordion label shows a proper localized string.
- Around line 433-446: Comments near the gr.Dropdown creation for vocal_language
use fullwidth punctuation characters (e.g., ':', ',') which triggers Ruff
RUF003; update the inline comments around the gr.Dropdown block (the lines
describing "中间:语言选择 (Dropdown)" and "右侧:格式化按钮 (Button)" and any other nearby
comments) to use ASCII punctuation (':' and ',') or rewrite the comments to
avoid fullwidth characters, keeping references to the gr.Dropdown instantiation
(vocal_language) and its parameters (choices=VALID_LANGUAGES, value="unknown",
label=t("generation.vocal_language_label"), allow_custom_value=True, scale=3)
intact so the code logic is unchanged.
- Around line 302-305: The checkbox and slider are using hardcoded defaults
instead of honoring pre-initialized settings; update the gr.Checkbox
initialization for enable_normalization and the gr.Slider for normalization_db
to pull values from init_params (e.g., use
init_params.get("enable_normalization", True) for enable_normalization and
init_params.get("normalization_db", -1.0) for normalization_db) so
pre-initialized runs show the correct UI state; ensure you coerce types as
needed (bool for enable_normalization, float for normalization_db) and leave the
info/label parameters unchanged.
- Around line 63-65: The visibility logic uses service_pre_initialized but
should hide the accordion when the app is running in service mode; change
accordion_visible to depend on the service-mode flag (e.g., set
accordion_visible = not service_mode or not is_service_mode) instead of
service_pre_initialized, ensuring the boolean used (service_mode /
is_service_mode) exists in scope and preserves accordion_open = not
service_pre_initialized for the open state.
🧹 Nitpick comments (4)
acestep/audio_utils.py (4)

69-79: Update docstring to include wav32 format.

The docstring mentions only 'flac', 'wav', 'mp3' but the code also accepts 'wav32'.

📝 Proposed docstring fix
         """
         Initialize audio saver
         
         Args:
-            default_format: Default save format ('flac', 'wav', 'mp3')
+            default_format: Default save format ('flac', 'wav', 'mp3', 'wav32')
         """

107-110: Simplify wav32 extension handling.

The current logic sets .wav32 suffix at line 110, then changes it back to .wav at line 152. Consider handling this mapping earlier for clarity.

♻️ Proposed simplification
         # Ensure output path has correct extension
         output_path = Path(output_path)
-        if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']:
-            output_path = output_path.with_suffix(f'.{format}')
+        ext = ".wav" if format == "wav32" else f".{format}"
+        if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']:
+            output_path = output_path.with_suffix(ext)

174-181: Unreachable code block.

The else branch is unreachable because format is validated to be one of ["flac", "wav", "mp3", "wav32"] at line 103, and all these formats are handled in the preceding if/elif branches.

🧹 Remove dead code
                     backend='soundfile',
                 )
-            else:
-                # Other formats use default backend
-                torchaudio.save(
-                    str(output_path),
-                    audio_tensor,
-                    sample_rate,
-                    channels_first=True,
-                )

291-314: Consider using SHA-256 instead of MD5.

While MD5 is sufficient for caching/deduplication purposes (not cryptographic), using SHA-256 would be more consistent with generate_uuid_from_params (which already uses SHA-256) and would silence the static analysis warnings (S324).

♻️ Proposed change
 def get_audio_file_hash(audio_file) -> str:
     ...
     try:
         if isinstance(audio_file, str):
             if os.path.exists(audio_file):
                 with open(audio_file, 'rb') as f:
-                    return hashlib.md5(f.read()).hexdigest()
-            return hashlib.md5(audio_file.encode('utf-8')).hexdigest()
+                    return hashlib.sha256(f.read()).hexdigest()
+            return hashlib.sha256(audio_file.encode('utf-8')).hexdigest()
         elif hasattr(audio_file, 'name'):
-            return hashlib.md5(str(audio_file.name).encode('utf-8')).hexdigest()
-        return hashlib.md5(str(audio_file).encode('utf-8')).hexdigest()
+            return hashlib.sha256(str(audio_file.name).encode('utf-8')).hexdigest()
+        return hashlib.sha256(str(audio_file).encode('utf-8')).hexdigest()
     except Exception:
-        return hashlib.md5(str(audio_file).encode('utf-8')).hexdigest()
+        return hashlib.sha256(str(audio_file).encode('utf-8')).hexdigest()

Comment on lines 112 to 128
# Convert to torch tensor
if isinstance(audio_data, np.ndarray):
if channels_first:
# numpy [samples, channels] -> tensor [channels, samples]
audio_tensor = torch.from_numpy(audio_data.T).float()
else:
# numpy [samples, channels] -> tensor [samples, channels] -> [channels, samples]
audio_tensor = torch.from_numpy(audio_data).float()
if audio_tensor.dim() == 2 and audio_tensor.shape[0] < audio_tensor.shape[1]:
audio_tensor = audio_tensor.T
else:
# torch tensor
audio_tensor = audio_data.cpu().float()
if not channels_first and audio_tensor.dim() == 2:
# [samples, channels] -> [channels, samples]
if audio_tensor.shape[0] > audio_tensor.shape[1]:
audio_tensor = audio_tensor.T
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The channels_first logic for numpy arrays appears inverted.

When channels_first=True, the input is expected to already be in [channels, samples] format. However, the code transposes it (line 116), which would incorrectly convert it to [samples, channels]. The torch tensor branch (lines 125-128) handles this correctly by only transposing when channels_first=False.

🐛 Proposed fix
         # Convert to torch tensor
         if isinstance(audio_data, np.ndarray):
-            if channels_first:
-                # numpy [samples, channels] -> tensor [channels, samples]
-                audio_tensor = torch.from_numpy(audio_data.T).float()
-            else:
-                # numpy [samples, channels] -> tensor [samples, channels] -> [channels, samples]
+            if not channels_first:
+                # numpy [samples, channels] -> tensor [channels, samples]
                 audio_tensor = torch.from_numpy(audio_data).float()
                 if audio_tensor.dim() == 2 and audio_tensor.shape[0] < audio_tensor.shape[1]:
                     audio_tensor = audio_tensor.T
+            else:
+                # numpy already [channels, samples]
+                audio_tensor = torch.from_numpy(audio_data).float()
🤖 Prompt for AI Agents
In `@acestep/audio_utils.py` around lines 112 - 128, The numpy-to-torch conversion
in audio_utils.py inverts the channels_first logic: when channels_first is True
the code should treat audio_data as already [channels, samples] and not
transpose; when False it should convert [samples, channels] to [channels,
samples] only if needed. Update the numpy branch handling of audio_data and
audio_tensor so it matches the torch branch: for numpy, if channels_first is
True convert directly with torch.from_numpy(audio_data).float() (no .T), and if
channels_first is False convert with torch.from_numpy(audio_data).float() and
transpose to [channels, samples] only when audio_tensor.dim() == 2 and
audio_tensor.shape[0] < audio_tensor.shape[1]; keep the torch branch logic
(audio_data.cpu().float() and conditional transpose) unchanged.

Comment on lines 186 to 195
except Exception as e:
try:
import soundfile as sf
audio_np = audio_tensor.transpose(0, 1).numpy() # -> [samples, channels]
sf.write(str(output_path), audio_np, sample_rate, format=format.upper())
logger.debug(f"[AudioSaver] Fallback soundfile Saved audio to {output_path} ({format}, {sample_rate}Hz)")
return str(output_path)
except Exception as e:
logger.error(f"[AudioSaver] Failed to save audio: {e}")
raise
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix unused variable and broken wav32 fallback.

Two issues:

  1. Line 186: Variable e is captured but never used (per static analysis).
  2. Line 190: If format is "wav32", format.upper() becomes "WAV32", which is not a valid soundfile format. The fallback will fail for wav32.
🐛 Proposed fix
-        except Exception as e:
+        except Exception:
             try:
                 import soundfile as sf
                 audio_np = audio_tensor.transpose(0, 1).numpy()  # -> [samples, channels]
-                sf.write(str(output_path), audio_np, sample_rate, format=format.upper())
+                sf_format = "WAV" if format == "wav32" else format.upper()
+                sf.write(str(output_path), audio_np, sample_rate, format=sf_format)
                 logger.debug(f"[AudioSaver] Fallback soundfile Saved audio to {output_path} ({format}, {sample_rate}Hz)")
                 return str(output_path)
-            except Exception as e:
+            except Exception as inner_e:
-                logger.error(f"[AudioSaver] Failed to save audio: {e}")
+                logger.error(f"[AudioSaver] Failed to save audio: {inner_e}")
                 raise
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 186-186: Do not catch blind exception: Exception

(BLE001)


[error] 186-186: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🤖 Prompt for AI Agents
In `@acestep/audio_utils.py` around lines 186 - 195, The fallback except block
captures an unused exception variable and incorrectly passes "wav32" as a
soundfile format; fix by changing the outer except to use a throwaway name
(e.g., except Exception:) or reference the variable in the log, and update the
soundfile fallback to map format "wav32" to format="WAV" with subtype="PCM_32"
(for other formats set format_upper = format.upper() and subtype=None), then
call sf.write(str(output_path), audio_np, sample_rate, format=format_upper,
subtype=subtype) and log the error appropriately using logger.error(e) when the
inner fallback fails; refer to the symbols audio_tensor, output_path,
sample_rate, format, logger and the fallback block in audio_utils.py.

normalization_db = gr.Slider(label=t("gen.normalization_db"), minimum=-10.0, maximum=0.0, step=0.1, value=-1.0, info=t("gen.normalization_db_info"))

# Advanced DiT Parameters
with gr.Accordion(t("generation.advanced_dit_params"), open=False):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add the missing i18n key for generation.advanced_dit_params.
Line 307 references a key that doesn’t exist in en/zh/ja, so the UI will show the raw key. Add the key to all locales or reuse an existing label.

🤖 Prompt for AI Agents
In `@acestep/gradio_ui/interfaces/generation.py` at line 307, The UI references a
missing i18n key generation.advanced_dit_params inside the gr.Accordion call
(look for the gr.Accordion(t("generation.advanced_dit_params"), open=False)
usage) causing the raw key to display; add the generation.advanced_dit_params
entry to all locale files (en/zh/ja) with appropriate translations or replace
the reference with an existing i18n key so the accordion label shows a proper
localized string.

Comment on lines +433 to +446
# 中间:语言选择 (Dropdown)
# 移除 gr.HTML hack,直接使用 label 参数,Gradio 会自动处理对齐
vocal_language = gr.Dropdown(
choices=VALID_LANGUAGES,
value="unknown",
label=t("generation.vocal_language_label"),
show_label=False,
container=True,
allow_custom_value=True,
scale=3,
)

# 右侧:格式化按钮 (Button)
# 放在同一行最右侧,操作更顺手
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Resolve Ruff RUF003 warnings in comments (fullwidth punctuation).
Lines 433-446 use fullwidth and , which Ruff flags.

🔧 Suggested cleanup
-                        # 中间:语言选择 (Dropdown)
-                        # 移除 gr.HTML hack,直接使用 label 参数,Gradio 会自动处理对齐
+                        # 中间: 语言选择 (Dropdown)
+                        # 移除 gr.HTML hack, 直接使用 label 参数, Gradio 会自动处理对齐
@@
-                        # 右侧:格式化按钮 (Button)
-                        # 放在同一行最右侧,操作更顺手
+                        # 右侧: 格式化按钮 (Button)
+                        # 放在同一行最右侧, 操作更顺手
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# 中间语言选择 (Dropdown)
# 移除 gr.HTML hack直接使用 label 参数Gradio 会自动处理对齐
vocal_language = gr.Dropdown(
choices=VALID_LANGUAGES,
value="unknown",
label=t("generation.vocal_language_label"),
show_label=False,
container=True,
allow_custom_value=True,
scale=3,
)
# 右侧格式化按钮 (Button)
# 放在同一行最右侧操作更顺手
# 中间: 语言选择 (Dropdown)
# 移除 gr.HTML hack, 直接使用 label 参数, Gradio 会自动处理对齐
vocal_language = gr.Dropdown(
choices=VALID_LANGUAGES,
value="unknown",
label=t("generation.vocal_language_label"),
show_label=False,
container=True,
allow_custom_value=True,
scale=3,
)
# 右侧: 格式化按钮 (Button)
# 放在同一行最右侧, 操作更顺手
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 433-433: Comment contains ambiguous (FULLWIDTH COLON). Did you mean : (COLON)?

(RUF003)


[warning] 434-434: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


[warning] 434-434: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)


[warning] 445-445: Comment contains ambiguous (FULLWIDTH COLON). Did you mean : (COLON)?

(RUF003)


[warning] 446-446: Comment contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

🤖 Prompt for AI Agents
In `@acestep/gradio_ui/interfaces/generation.py` around lines 433 - 446, Comments
near the gr.Dropdown creation for vocal_language use fullwidth punctuation
characters (e.g., ':', ',') which triggers Ruff RUF003; update the inline
comments around the gr.Dropdown block (the lines describing "中间:语言选择 (Dropdown)"
and "右侧:格式化按钮 (Button)" and any other nearby comments) to use ASCII punctuation
(':' and ',') or rewrite the comments to avoid fullwidth characters, keeping
references to the gr.Dropdown instantiation (vocal_language) and its parameters
(choices=VALID_LANGUAGES, value="unknown",
label=t("generation.vocal_language_label"), allow_custom_value=True, scale=3)
intact so the code logic is unchanged.

@ChuxiJ
Copy link
Contributor

ChuxiJ commented Feb 10, 2026

You’ve modified more than 5,000 lines of code for this small change, which makes testing and review very difficult.
We’re preparing for a more stable release soon, so I’ll close your PR for now.
Please see if you can work on a simplified implementation instead.

Includes the fix for the NumPy problem and improved memory logic.
Includes the restored variable text2music_audio_codes_group and the new checkbox for normalization.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@acestep/audio_utils.py`:
- Around line 210-252: The convert_audio docstring is missing the newly
supported 'wav32' output format; update the docstring for convert_audio to list
'wav32' among supported formats (e.g., "flac, wav, wav32, mp3") and ensure any
mention of output_format in the function signature/comments matches this
addition; check save_audio usage and related docs/comments to keep them
consistent with the new 'wav32' option.
- Around line 107-118: When format == "wav32" we must force the output_path
extension to ".wav" regardless of any existing suffix; modify the current
conditional so the first branch checks if format == "wav32" and calls
output_path = output_path.with_suffix(".wav"), then otherwise fall through to
the existing validation (e.g., if output_path.suffix.lower() not in ['.flac',
'.wav', '.mp3'] then output_path = output_path.with_suffix(ext)). Update the
logic surrounding output_path, ext and the special-case handling (remove the
separate ".wav32" elif) so wav32 always becomes .wav while other formats still
respect or correct unknown extensions.

Comment on lines +107 to +118
# Ensure output path has correct extension
output_path = Path(output_path)

# Determine extension based on format
ext = ".wav" if format == "wav32" else f".{format}"

if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']:
output_path = output_path.with_suffix(ext)
elif format == "wav32" and output_path.suffix.lower() == ".wav32":
# Explicitly fix .wav32 extension if present
output_path = output_path.with_suffix(".wav")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Force .wav extension for wav32 even when a different suffix is supplied.
Right now, format="wav32" with an existing .flac/.mp3 suffix won’t be corrected, leading to a mismatched extension.

🐛 Proposed fix
-        if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']:
-            output_path = output_path.with_suffix(ext)
-        elif format == "wav32" and output_path.suffix.lower() == ".wav32":
-             # Explicitly fix .wav32 extension if present
-             output_path = output_path.with_suffix(".wav")
+        if format == "wav32":
+            if output_path.suffix.lower() != ".wav":
+                # Explicitly use .wav extension for wav32
+                output_path = output_path.with_suffix(".wav")
+        elif output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']:
+            output_path = output_path.with_suffix(ext)
🤖 Prompt for AI Agents
In `@acestep/audio_utils.py` around lines 107 - 118, When format == "wav32" we
must force the output_path extension to ".wav" regardless of any existing
suffix; modify the current conditional so the first branch checks if format ==
"wav32" and calls output_path = output_path.with_suffix(".wav"), then otherwise
fall through to the existing validation (e.g., if output_path.suffix.lower() not
in ['.flac', '.wav', '.mp3'] then output_path = output_path.with_suffix(ext)).
Update the logic surrounding output_path, ext and the special-case handling
(remove the separate ".wav32" elif) so wav32 always becomes .wav while other
formats still respect or correct unknown extensions.

Comment on lines +210 to +252
def convert_audio(
self,
input_path: Union[str, Path],
output_path: Union[str, Path],
output_format: str,
remove_input: bool = False,
) -> str:
"""
Convert audio format

Args:
input_path: Input audio file path
output_path: Output audio file path
output_format: Target format ('flac', 'wav', 'mp3')
remove_input: Whether to delete input file

Returns:
Output file path
"""
input_path = Path(input_path)
output_path = Path(output_path)

if not input_path.exists():
raise FileNotFoundError(f"Input file not found: {input_path}")

# Load audio
audio_tensor, sample_rate = torchaudio.load(str(input_path))

# Save as new format
output_path = self.save_audio(
audio_tensor,
output_path,
sample_rate=sample_rate,
format=output_format,
channels_first=True
)

# Delete input file if needed
if remove_input:
input_path.unlink()
logger.debug(f"[AudioSaver] Removed input file: {input_path}")

return output_path
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring should include wav32 as a supported output format.
Minor doc mismatch with the newly supported option.

📝 Proposed fix
-            output_format: Target format ('flac', 'wav', 'mp3')
+            output_format: Target format ('flac', 'wav', 'mp3', 'wav32')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def convert_audio(
self,
input_path: Union[str, Path],
output_path: Union[str, Path],
output_format: str,
remove_input: bool = False,
) -> str:
"""
Convert audio format
Args:
input_path: Input audio file path
output_path: Output audio file path
output_format: Target format ('flac', 'wav', 'mp3')
remove_input: Whether to delete input file
Returns:
Output file path
"""
input_path = Path(input_path)
output_path = Path(output_path)
if not input_path.exists():
raise FileNotFoundError(f"Input file not found: {input_path}")
# Load audio
audio_tensor, sample_rate = torchaudio.load(str(input_path))
# Save as new format
output_path = self.save_audio(
audio_tensor,
output_path,
sample_rate=sample_rate,
format=output_format,
channels_first=True
)
# Delete input file if needed
if remove_input:
input_path.unlink()
logger.debug(f"[AudioSaver] Removed input file: {input_path}")
return output_path
def convert_audio(
self,
input_path: Union[str, Path],
output_path: Union[str, Path],
output_format: str,
remove_input: bool = False,
) -> str:
"""
Convert audio format
Args:
input_path: Input audio file path
output_path: Output audio file path
output_format: Target format ('flac', 'wav', 'mp3', 'wav32')
remove_input: Whether to delete input file
Returns:
Output file path
"""
input_path = Path(input_path)
output_path = Path(output_path)
if not input_path.exists():
raise FileNotFoundError(f"Input file not found: {input_path}")
# Load audio
audio_tensor, sample_rate = torchaudio.load(str(input_path))
# Save as new format
output_path = self.save_audio(
audio_tensor,
output_path,
sample_rate=sample_rate,
format=output_format,
channels_first=True
)
# Delete input file if needed
if remove_input:
input_path.unlink()
logger.debug(f"[AudioSaver] Removed input file: {input_path}")
return output_path
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 233-233: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@acestep/audio_utils.py` around lines 210 - 252, The convert_audio docstring
is missing the newly supported 'wav32' output format; update the docstring for
convert_audio to list 'wav32' among supported formats (e.g., "flac, wav, wav32,
mp3") and ensure any mention of output_format in the function signature/comments
matches this addition; check save_audio usage and related docs/comments to keep
them consistent with the new 'wav32' option.

(These three contain the missing translation texts for the new features)
@ChuxiJ
Copy link
Contributor

ChuxiJ commented Feb 11, 2026

please solve conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants