Normalize audio against clipping and 32bit wav support #406

lutzkirschner64-dot · 2026-02-10T00:08:16Z

This update addresses critical audio quality issues. Previously, generated songs often suffered from severe clipping and lacked headroom (generated at 0.0 dBFS True Peak). These changes implement professional audio standards to ensure a clean workflow for musicians and engineers.

Key Enhancements:

Peak Normalization: Prevents clipping by scaling audio to a safe target level (default -1.0 dB).
32-bit Float Support: Preserves full dynamic range and prevents signal degradation.
UI Localization: Full support for English, Chinese, and Japanese labels.
CHANGES_SUMMARY.md

Summary by CodeRabbit

New Features
- Added audio normalization with configurable target dB and UI controls.
- Added WAV32 export option for higher-fidelity 32-bit WAV saves.
- Expanded generation UI with normalization and LoRA controls.
Bug Fixes
- More robust audio saving with fallbacks and improved extension/format handling.
- Audio hashing/ID generation now uses SHA-256 (hash values will change).
Localization & UI Updates
- Added translation entries for normalization controls and refined batch-results wording.

acestep/audio_utils.py (Includes the new `normalize_audio` function and the 32-bit memory fix) acestep/inference.py (Calls normalization)

acestep/gradio_ui/interfaces/generation.py(Checkbox for normalization & 32-bit selection)

acestep/gradio_ui/events/results_handlers.py (Most important fix for the "FileNotFoundError")

acestep/gradio_ui/i18n/en.json (English text for the new features) acestep/gradio_ui/i18n/zh.json (Chinese placeholders) acestep/gradio_ui/i18n/ja.json (Japanese placeholders)

coderabbitai · 2026-02-10T00:08:35Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds peak audio normalization and WAV32 save pathways to audio utilities, migrates audio hashing to SHA‑256, exposes normalization controls in the generation UI, and adds corresponding English/Japanese/Chinese translation keys and minor UI string updates.

Changes

Cohort / File(s)	Summary
Audio Utilities `acestep/audio_utils.py`	Added `normalize_audio(audio_data, target_db=-1.0)` for peak normalization. Extended `AudioSaver`/`save_audio` to support `"wav32"` with a dedicated save path and soundfile fallbacks, improved extension/format handling, and switched hashing functions (`get_audio_file_hash`, `generate_uuid_from_audio_data`) from MD5 to SHA‑256.
Generation Interface `acestep/gradio_ui/interfaces/generation.py`	Replaced placeholder with a full `create_generation_section(...)` implementation that builds the Gradio generation UI, integrates normalization controls (`enable_normalization`, `normalization_db`), adjusts device/backend defaults and runtime UI logic, and returns expanded component/state handles. No public signature change.
Internationalization `acestep/gradio_ui/i18n/en.json`, `acestep/gradio_ui/i18n/ja.json`, `acestep/gradio_ui/i18n/zh.json`	Added top‑level `gen` keys: `enable_normalization`, `enable_normalization_info`, `normalization_db`, `normalization_db_info`. Also updated some UI strings (e.g., batch results wording) across locales.

Sequence Diagram(s)

sequenceDiagram
    participant UI as "User / Generation UI"
    participant Gen as "Generation Section\n(create_generation_section)"
    participant Audio as "acestep.audio_utils\n(normalize_audio, AudioSaver)"
    participant FS as "Filesystem / Soundfile\n(encoding / save backend)"

    UI->>Gen: toggle normalization, submit generation
    Gen->>Audio: send audio buffers + normalization params
    Audio->>Audio: normalize_audio(audio, target_db)
    Audio->>Audio: choose save format/path (wav32 or other)
    Audio->>FS: attempt primary save (wav32 path / ffmpeg)
    alt primary save fails
        Audio->>FS: fallback save using soundfile (32-bit WAV or other)
    end
    Audio->>Audio: compute SHA-256 hash / uuid from audio data
    Audio->>Gen: return saved file path + hash/uuid
    Gen->>UI: display results (file link, hash, generation details)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through samples, soft and neat,
I nudged the peaks until they breathed, not bleat,
WAV32 tunnels, fallbacks paved the way,
SHA‑256 seeds mark each small audio bay,
A tiny rabbit cheers the new UI seat!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the two main changes: audio normalization to prevent clipping and support for 32-bit WAV format, both of which are prominent throughout the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In `@acestep/audio_utils.py`:
- Around line 112-128: The numpy-to-torch conversion in audio_utils.py inverts
the channels_first logic: when channels_first is True the code should treat
audio_data as already [channels, samples] and not transpose; when False it
should convert [samples, channels] to [channels, samples] only if needed. Update
the numpy branch handling of audio_data and audio_tensor so it matches the torch
branch: for numpy, if channels_first is True convert directly with
torch.from_numpy(audio_data).float() (no .T), and if channels_first is False
convert with torch.from_numpy(audio_data).float() and transpose to [channels,
samples] only when audio_tensor.dim() == 2 and audio_tensor.shape[0] <
audio_tensor.shape[1]; keep the torch branch logic (audio_data.cpu().float() and
conditional transpose) unchanged.
- Around line 186-195: The fallback except block captures an unused exception
variable and incorrectly passes "wav32" as a soundfile format; fix by changing
the outer except to use a throwaway name (e.g., except Exception:) or reference
the variable in the log, and update the soundfile fallback to map format "wav32"
to format="WAV" with subtype="PCM_32" (for other formats set format_upper =
format.upper() and subtype=None), then call sf.write(str(output_path), audio_np,
sample_rate, format=format_upper, subtype=subtype) and log the error
appropriately using logger.error(e) when the inner fallback fails; refer to the
symbols audio_tensor, output_path, sample_rate, format, logger and the fallback
block in audio_utils.py.

In `@acestep/gradio_ui/i18n/ja.json`:
- Around line 247-252: The new localization keys under "gen"
(enable_normalization, enable_normalization_info, normalization_db,
normalization_db_info) are still in English; translate those four strings into
Japanese and replace the English values in the ja.json file so the UI shows
Japanese labels and help text for audio normalization; keep the JSON keys
unchanged and only update their string values with accurate Japanese
translations that match the tone and length of existing entries.

In `@acestep/gradio_ui/i18n/zh.json`:
- Around line 247-252: The JSON keys under "gen" ("enable_normalization",
"enable_normalization_info", "normalization_db", "normalization_db_info") are
still in English; replace their English values with Chinese translations so the
zh locale shows localized labels and help text (translate the label "Enable
Normalization", the info text about normalizing audio volume, "Target Peak (dB)"
and the info about dB recommendations). Keep the keys intact and only update the
string values for those four keys.

In `@acestep/gradio_ui/interfaces/generation.py`:
- Line 307: The UI references a missing i18n key generation.advanced_dit_params
inside the gr.Accordion call (look for the
gr.Accordion(t("generation.advanced_dit_params"), open=False) usage) causing the
raw key to display; add the generation.advanced_dit_params entry to all locale
files (en/zh/ja) with appropriate translations or replace the reference with an
existing i18n key so the accordion label shows a proper localized string.
- Around line 433-446: Comments near the gr.Dropdown creation for vocal_language
use fullwidth punctuation characters (e.g., '：', '，') which triggers Ruff
RUF003; update the inline comments around the gr.Dropdown block (the lines
describing "中间：语言选择 (Dropdown)" and "右侧：格式化按钮 (Button)" and any other nearby
comments) to use ASCII punctuation (':' and ',') or rewrite the comments to
avoid fullwidth characters, keeping references to the gr.Dropdown instantiation
(vocal_language) and its parameters (choices=VALID_LANGUAGES, value="unknown",
label=t("generation.vocal_language_label"), allow_custom_value=True, scale=3)
intact so the code logic is unchanged.
- Around line 302-305: The checkbox and slider are using hardcoded defaults
instead of honoring pre-initialized settings; update the gr.Checkbox
initialization for enable_normalization and the gr.Slider for normalization_db
to pull values from init_params (e.g., use
init_params.get("enable_normalization", True) for enable_normalization and
init_params.get("normalization_db", -1.0) for normalization_db) so
pre-initialized runs show the correct UI state; ensure you coerce types as
needed (bool for enable_normalization, float for normalization_db) and leave the
info/label parameters unchanged.
- Around line 63-65: The visibility logic uses service_pre_initialized but
should hide the accordion when the app is running in service mode; change
accordion_visible to depend on the service-mode flag (e.g., set
accordion_visible = not service_mode or not is_service_mode) instead of
service_pre_initialized, ensuring the boolean used (service_mode /
is_service_mode) exists in scope and preserves accordion_open = not
service_pre_initialized for the open state.

🧹 Nitpick comments (4)

acestep/audio_utils.py (4)

69-79: Update docstring to include wav32 format.

The docstring mentions only 'flac', 'wav', 'mp3' but the code also accepts 'wav32'.

📝 Proposed docstring fix

         """
         Initialize audio saver
         
         Args:
-            default_format: Default save format ('flac', 'wav', 'mp3')
+            default_format: Default save format ('flac', 'wav', 'mp3', 'wav32')
         """

107-110: Simplify wav32 extension handling.

The current logic sets .wav32 suffix at line 110, then changes it back to .wav at line 152. Consider handling this mapping earlier for clarity.

♻️ Proposed simplification

         # Ensure output path has correct extension
         output_path = Path(output_path)
-        if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']:
-            output_path = output_path.with_suffix(f'.{format}')
+        ext = ".wav" if format == "wav32" else f".{format}"
+        if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']:
+            output_path = output_path.with_suffix(ext)

174-181: Unreachable code block.

The else branch is unreachable because format is validated to be one of ["flac", "wav", "mp3", "wav32"] at line 103, and all these formats are handled in the preceding if/elif branches.

🧹 Remove dead code

                     backend='soundfile',
                 )
-            else:
-                # Other formats use default backend
-                torchaudio.save(
-                    str(output_path),
-                    audio_tensor,
-                    sample_rate,
-                    channels_first=True,
-                )

291-314: Consider using SHA-256 instead of MD5.

While MD5 is sufficient for caching/deduplication purposes (not cryptographic), using SHA-256 would be more consistent with generate_uuid_from_params (which already uses SHA-256) and would silence the static analysis warnings (S324).

♻️ Proposed change

 def get_audio_file_hash(audio_file) -> str:
     ...
     try:
         if isinstance(audio_file, str):
             if os.path.exists(audio_file):
                 with open(audio_file, 'rb') as f:
-                    return hashlib.md5(f.read()).hexdigest()
-            return hashlib.md5(audio_file.encode('utf-8')).hexdigest()
+                    return hashlib.sha256(f.read()).hexdigest()
+            return hashlib.sha256(audio_file.encode('utf-8')).hexdigest()
         elif hasattr(audio_file, 'name'):
-            return hashlib.md5(str(audio_file.name).encode('utf-8')).hexdigest()
-        return hashlib.md5(str(audio_file).encode('utf-8')).hexdigest()
+            return hashlib.sha256(str(audio_file.name).encode('utf-8')).hexdigest()
+        return hashlib.sha256(str(audio_file).encode('utf-8')).hexdigest()
     except Exception:
-        return hashlib.md5(str(audio_file).encode('utf-8')).hexdigest()
+        return hashlib.sha256(str(audio_file).encode('utf-8')).hexdigest()

coderabbitai · 2026-02-10T00:12:43Z

acestep/audio_utils.py

+        # Convert to torch tensor
+        if isinstance(audio_data, np.ndarray):
+            if channels_first:
+                # numpy [samples, channels] -> tensor [channels, samples]
+                audio_tensor = torch.from_numpy(audio_data.T).float()
+            else:
+                # numpy [samples, channels] -> tensor [samples, channels] -> [channels, samples]
+                audio_tensor = torch.from_numpy(audio_data).float()
+                if audio_tensor.dim() == 2 and audio_tensor.shape[0] < audio_tensor.shape[1]:
+                    audio_tensor = audio_tensor.T
+        else:
+            # torch tensor
+            audio_tensor = audio_data.cpu().float()
+            if not channels_first and audio_tensor.dim() == 2:
+                # [samples, channels] -> [channels, samples]
+                if audio_tensor.shape[0] > audio_tensor.shape[1]:
+                    audio_tensor = audio_tensor.T


⚠️ Potential issue | 🟠 Major

The channels_first logic for numpy arrays appears inverted.

When channels_first=True, the input is expected to already be in [channels, samples] format. However, the code transposes it (line 116), which would incorrectly convert it to [samples, channels]. The torch tensor branch (lines 125-128) handles this correctly by only transposing when channels_first=False.

🐛 Proposed fix

# Convert to torch tensor if isinstance(audio_data, np.ndarray): - if channels_first: - # numpy [samples, channels] -> tensor [channels, samples] - audio_tensor = torch.from_numpy(audio_data.T).float() - else: - # numpy [samples, channels] -> tensor [samples, channels] -> [channels, samples] + if not channels_first: + # numpy [samples, channels] -> tensor [channels, samples] audio_tensor = torch.from_numpy(audio_data).float() if audio_tensor.dim() == 2 and audio_tensor.shape[0] < audio_tensor.shape[1]: audio_tensor = audio_tensor.T + else: + # numpy already [channels, samples] + audio_tensor = torch.from_numpy(audio_data).float()

🤖 Prompt for AI Agents

In `@acestep/audio_utils.py` around lines 112 - 128, The numpy-to-torch conversion in audio_utils.py inverts the channels_first logic: when channels_first is True the code should treat audio_data as already [channels, samples] and not transpose; when False it should convert [samples, channels] to [channels, samples] only if needed. Update the numpy branch handling of audio_data and audio_tensor so it matches the torch branch: for numpy, if channels_first is True convert directly with torch.from_numpy(audio_data).float() (no .T), and if channels_first is False convert with torch.from_numpy(audio_data).float() and transpose to [channels, samples] only when audio_tensor.dim() == 2 and audio_tensor.shape[0] < audio_tensor.shape[1]; keep the torch branch logic (audio_data.cpu().float() and conditional transpose) unchanged.

coderabbitai · 2026-02-10T00:12:43Z

acestep/audio_utils.py

+        except Exception as e:
+            try:
+                import soundfile as sf
+                audio_np = audio_tensor.transpose(0, 1).numpy()  # -> [samples, channels]
+                sf.write(str(output_path), audio_np, sample_rate, format=format.upper())
+                logger.debug(f"[AudioSaver] Fallback soundfile Saved audio to {output_path} ({format}, {sample_rate}Hz)")
+                return str(output_path)
+            except Exception as e:
+                logger.error(f"[AudioSaver] Failed to save audio: {e}")
+                raise


⚠️ Potential issue | 🟡 Minor

Fix unused variable and broken wav32 fallback.

Two issues:

Line 186: Variable e is captured but never used (per static analysis).

Line 190: If format is "wav32", format.upper() becomes "WAV32", which is not a valid soundfile format. The fallback will fail for wav32.

🐛 Proposed fix

- except Exception as e: + except Exception: try: import soundfile as sf audio_np = audio_tensor.transpose(0, 1).numpy() # -> [samples, channels] - sf.write(str(output_path), audio_np, sample_rate, format=format.upper()) + sf_format = "WAV" if format == "wav32" else format.upper() + sf.write(str(output_path), audio_np, sample_rate, format=sf_format) logger.debug(f"[AudioSaver] Fallback soundfile Saved audio to {output_path} ({format}, {sample_rate}Hz)") return str(output_path) - except Exception as e: + except Exception as inner_e: - logger.error(f"[AudioSaver] Failed to save audio: {e}") + logger.error(f"[AudioSaver] Failed to save audio: {inner_e}") raise

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 186-186: Do not catch blind exception: Exception

(BLE001)

[error] 186-186: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🤖 Prompt for AI Agents

In `@acestep/audio_utils.py` around lines 186 - 195, The fallback except block captures an unused exception variable and incorrectly passes "wav32" as a soundfile format; fix by changing the outer except to use a throwaway name (e.g., except Exception:) or reference the variable in the log, and update the soundfile fallback to map format "wav32" to format="WAV" with subtype="PCM_32" (for other formats set format_upper = format.upper() and subtype=None), then call sf.write(str(output_path), audio_np, sample_rate, format=format_upper, subtype=subtype) and log the error appropriately using logger.error(e) when the inner fallback fails; refer to the symbols audio_tensor, output_path, sample_rate, format, logger and the fallback block in audio_utils.py.

acestep/gradio_ui/i18n/ja.json

acestep/gradio_ui/i18n/zh.json

acestep/gradio_ui/interfaces/generation.py

coderabbitai · 2026-02-10T00:12:44Z

acestep/gradio_ui/interfaces/generation.py

+                    normalization_db = gr.Slider(label=t("gen.normalization_db"), minimum=-10.0, maximum=0.0, step=0.1, value=-1.0, info=t("gen.normalization_db_info"))
+
+                # Advanced DiT Parameters
+                with gr.Accordion(t("generation.advanced_dit_params"), open=False):


⚠️ Potential issue | 🟡 Minor

Add the missing i18n key for generation.advanced_dit_params.
Line 307 references a key that doesn’t exist in en/zh/ja, so the UI will show the raw key. Add the key to all locales or reuse an existing label.

🤖 Prompt for AI Agents

In `@acestep/gradio_ui/interfaces/generation.py` at line 307, The UI references a missing i18n key generation.advanced_dit_params inside the gr.Accordion call (look for the gr.Accordion(t("generation.advanced_dit_params"), open=False) usage) causing the raw key to display; add the generation.advanced_dit_params entry to all locale files (en/zh/ja) with appropriate translations or replace the reference with an existing i18n key so the accordion label shows a proper localized string.

coderabbitai · 2026-02-10T00:12:44Z

acestep/gradio_ui/interfaces/generation.py

+                        # 中间：语言选择 (Dropdown)
+                        # 移除 gr.HTML hack，直接使用 label 参数，Gradio 会自动处理对齐
+                        vocal_language = gr.Dropdown(
+                            choices=VALID_LANGUAGES,
+                            value="unknown",
+                            label=t("generation.vocal_language_label"),
+                            show_label=False, 
+                            container=True, 
+                            allow_custom_value=True,
+                            scale=3,
+                        )
+
+                        # 右侧：格式化按钮 (Button)
+                        # 放在同一行最右侧，操作更顺手


⚠️ Potential issue | 🟡 Minor

Resolve Ruff RUF003 warnings in comments (fullwidth punctuation).
Lines 433-446 use fullwidth ： and ，, which Ruff flags.

🔧 Suggested cleanup

- # 中间：语言选择 (Dropdown) - # 移除 gr.HTML hack，直接使用 label 参数，Gradio 会自动处理对齐 + # 中间: 语言选择 (Dropdown) + # 移除 gr.HTML hack, 直接使用 label 参数, Gradio 会自动处理对齐 @@ - # 右侧：格式化按钮 (Button) - # 放在同一行最右侧，操作更顺手 + # 右侧: 格式化按钮 (Button) + # 放在同一行最右侧, 操作更顺手

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# 中间：语言选择 (Dropdown)

# 移除 gr.HTML hack，直接使用 label 参数，Gradio 会自动处理对齐

vocal_language = gr.Dropdown(

choices=VALID_LANGUAGES,

value="unknown",

label=t("generation.vocal_language_label"),

show_label=False,

container=True,

allow_custom_value=True,

scale=3,

)

# 右侧：格式化按钮 (Button)

# 放在同一行最右侧，操作更顺手

# 中间: 语言选择 (Dropdown)

# 移除 gr.HTML hack, 直接使用 label 参数, Gradio 会自动处理对齐

vocal_language = gr.Dropdown(

choices=VALID_LANGUAGES,

value="unknown",

label=t("generation.vocal_language_label"),

show_label=False,

container=True,

allow_custom_value=True,

scale=3,

)

# 右侧: 格式化按钮 (Button)

# 放在同一行最右侧, 操作更顺手

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 433-433: Comment contains ambiguous ： (FULLWIDTH COLON). Did you mean : (COLON)?

(RUF003)

[warning] 434-434: Comment contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

[warning] 434-434: Comment contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

[warning] 445-445: Comment contains ambiguous ： (FULLWIDTH COLON). Did you mean : (COLON)?

(RUF003)

[warning] 446-446: Comment contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF003)

🤖 Prompt for AI Agents

In `@acestep/gradio_ui/interfaces/generation.py` around lines 433 - 446, Comments near the gr.Dropdown creation for vocal_language use fullwidth punctuation characters (e.g., '：', '，') which triggers Ruff RUF003; update the inline comments around the gr.Dropdown block (the lines describing "中间：语言选择 (Dropdown)" and "右侧：格式化按钮 (Button)" and any other nearby comments) to use ASCII punctuation (':' and ',') or rewrite the comments to avoid fullwidth characters, keeping references to the gr.Dropdown instantiation (vocal_language) and its parameters (choices=VALID_LANGUAGES, value="unknown", label=t("generation.vocal_language_label"), allow_custom_value=True, scale=3) intact so the code logic is unchanged.

ChuxiJ · 2026-02-10T01:07:19Z

You’ve modified more than 5,000 lines of code for this small change, which makes testing and review very difficult.
We’re preparing for a more stable release soon, so I’ll close your PR for now.
Please see if you can work on a simplified implementation instead.

Includes the fix for the NumPy problem and improved memory logic.

Includes the restored variable text2music_audio_codes_group and the new checkbox for normalization.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@acestep/audio_utils.py`:
- Around line 210-252: The convert_audio docstring is missing the newly
supported 'wav32' output format; update the docstring for convert_audio to list
'wav32' among supported formats (e.g., "flac, wav, wav32, mp3") and ensure any
mention of output_format in the function signature/comments matches this
addition; check save_audio usage and related docs/comments to keep them
consistent with the new 'wav32' option.
- Around line 107-118: When format == "wav32" we must force the output_path
extension to ".wav" regardless of any existing suffix; modify the current
conditional so the first branch checks if format == "wav32" and calls
output_path = output_path.with_suffix(".wav"), then otherwise fall through to
the existing validation (e.g., if output_path.suffix.lower() not in ['.flac',
'.wav', '.mp3'] then output_path = output_path.with_suffix(ext)). Update the
logic surrounding output_path, ext and the special-case handling (remove the
separate ".wav32" elif) so wav32 always becomes .wav while other formats still
respect or correct unknown extensions.

coderabbitai · 2026-02-10T01:21:11Z

acestep/audio_utils.py

+        # Ensure output path has correct extension
+        output_path = Path(output_path)
+
+        # Determine extension based on format
+        ext = ".wav" if format == "wav32" else f".{format}"
+
+        if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']:
+            output_path = output_path.with_suffix(ext)
+        elif format == "wav32" and output_path.suffix.lower() == ".wav32":
+             # Explicitly fix .wav32 extension if present
+             output_path = output_path.with_suffix(".wav")
+


⚠️ Potential issue | 🟡 Minor

Force .wav extension for wav32 even when a different suffix is supplied.
Right now, format="wav32" with an existing .flac/.mp3 suffix won’t be corrected, leading to a mismatched extension.

🐛 Proposed fix

- if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']: - output_path = output_path.with_suffix(ext) - elif format == "wav32" and output_path.suffix.lower() == ".wav32": - # Explicitly fix .wav32 extension if present - output_path = output_path.with_suffix(".wav") + if format == "wav32": + if output_path.suffix.lower() != ".wav": + # Explicitly use .wav extension for wav32 + output_path = output_path.with_suffix(".wav") + elif output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']: + output_path = output_path.with_suffix(ext)

🤖 Prompt for AI Agents

In `@acestep/audio_utils.py` around lines 107 - 118, When format == "wav32" we must force the output_path extension to ".wav" regardless of any existing suffix; modify the current conditional so the first branch checks if format == "wav32" and calls output_path = output_path.with_suffix(".wav"), then otherwise fall through to the existing validation (e.g., if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3'] then output_path = output_path.with_suffix(ext)). Update the logic surrounding output_path, ext and the special-case handling (remove the separate ".wav32" elif) so wav32 always becomes .wav while other formats still respect or correct unknown extensions.

coderabbitai · 2026-02-10T01:21:12Z

acestep/audio_utils.py

+    def convert_audio(
+        self,
+        input_path: Union[str, Path],
+        output_path: Union[str, Path],
+        output_format: str,
+        remove_input: bool = False,
+    ) -> str:
+        """
+        Convert audio format
+
+        Args:
+            input_path: Input audio file path
+            output_path: Output audio file path
+            output_format: Target format ('flac', 'wav', 'mp3')
+            remove_input: Whether to delete input file
+
+        Returns:
+            Output file path
+        """
+        input_path = Path(input_path)
+        output_path = Path(output_path)
+
+        if not input_path.exists():
+            raise FileNotFoundError(f"Input file not found: {input_path}")
+
+        # Load audio
+        audio_tensor, sample_rate = torchaudio.load(str(input_path))
+
+        # Save as new format
+        output_path = self.save_audio(
+            audio_tensor,
+            output_path,
+            sample_rate=sample_rate,
+            format=output_format,
+            channels_first=True
+        )
+
+        # Delete input file if needed
+        if remove_input:
+            input_path.unlink()
+            logger.debug(f"[AudioSaver] Removed input file: {input_path}")
+
+        return output_path


⚠️ Potential issue | 🟡 Minor

Docstring should include wav32 as a supported output format.
Minor doc mismatch with the newly supported option.

📝 Proposed fix

- output_format: Target format ('flac', 'wav', 'mp3') + output_format: Target format ('flac', 'wav', 'mp3', 'wav32')

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def convert_audio(

self,

input_path: Union[str, Path],

output_path: Union[str, Path],

output_format: str,

remove_input: bool = False,

) -> str:

"""

Convert audio format

Args:

input_path: Input audio file path

output_path: Output audio file path

output_format: Target format ('flac', 'wav', 'mp3')

remove_input: Whether to delete input file

Returns:

Output file path

"""

input_path = Path(input_path)

output_path = Path(output_path)

if not input_path.exists():

raise FileNotFoundError(f"Input file not found: {input_path}")

# Load audio

audio_tensor, sample_rate = torchaudio.load(str(input_path))

# Save as new format

output_path = self.save_audio(

audio_tensor,

output_path,

sample_rate=sample_rate,

format=output_format,

channels_first=True

)

# Delete input file if needed

if remove_input:

input_path.unlink()

logger.debug(f"[AudioSaver] Removed input file: {input_path}")

return output_path

def convert_audio(

self,

input_path: Union[str, Path],

output_path: Union[str, Path],

output_format: str,

remove_input: bool = False,

) -> str:

"""

Convert audio format

Args:

input_path: Input audio file path

output_path: Output audio file path

output_format: Target format ('flac', 'wav', 'mp3', 'wav32')

remove_input: Whether to delete input file

Returns:

Output file path

"""

input_path = Path(input_path)

output_path = Path(output_path)

if not input_path.exists():

raise FileNotFoundError(f"Input file not found: {input_path}")

# Load audio

audio_tensor, sample_rate = torchaudio.load(str(input_path))

# Save as new format

output_path = self.save_audio(

audio_tensor,

output_path,

sample_rate=sample_rate,

format=output_format,

channels_first=True

)

# Delete input file if needed

if remove_input:

input_path.unlink()

logger.debug(f"[AudioSaver] Removed input file: {input_path}")

return output_path

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 233-233: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In `@acestep/audio_utils.py` around lines 210 - 252, The convert_audio docstring is missing the newly supported 'wav32' output format; update the docstring for convert_audio to list 'wav32' among supported formats (e.g., "flac, wav, wav32, mp3") and ensure any mention of output_format in the function signature/comments matches this addition; check save_audio usage and related docs/comments to keep them consistent with the new 'wav32' option.

(These three contain the missing translation texts for the new features)

ChuxiJ · 2026-02-11T08:40:14Z

please solve conflicts

…av-support

lutzkirschner64-dot added 4 commits February 10, 2026 00:42

Add files via upload

398394d

acestep/audio_utils.py (Includes the new `normalize_audio` function and the 32-bit memory fix) acestep/inference.py (Calls normalization)

Add files via upload

5feab3f

acestep/gradio_ui/interfaces/generation.py(Checkbox for normalization & 32-bit selection)

Add files via upload

5cb0379

acestep/gradio_ui/events/results_handlers.py (Most important fix for the "FileNotFoundError")

Add files via upload

1454e56

acestep/gradio_ui/i18n/en.json (English text for the new features) acestep/gradio_ui/i18n/zh.json (Chinese placeholders) acestep/gradio_ui/i18n/ja.json (Japanese placeholders)

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

lutzkirschner64-dot added 2 commits February 10, 2026 02:17

Add files via upload

bd0d055

Includes the fix for the NumPy problem and improved memory logic.

Add files via upload

295da5e

Includes the restored variable text2music_audio_codes_group and the new checkbox for normalization.

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

Add files via upload

e90dcb0

(These three contain the missing translation texts for the new features)

ChuxiJ approved these changes Feb 11, 2026

View reviewed changes

Merge branch 'main' into normalize-audio-against-clipping-and-32bit-w…

733cbd3

…av-support

ChuxiJ merged commit 77113ba into ace-step:main Feb 11, 2026
1 check was pending

This was referenced Feb 11, 2026

refactor: improve audio encoding logic with enhanced caching #448

Merged

refact: Reorganized Advanced Settings UI & Added latent shift and rescale #452

Merged

1larity mentioned this pull request Feb 11, 2026

Annoyance: English UI lost training i18n keys; training tab shows raw keys like training.training_progress [regression] #455

Closed

Normalize audio against clipping and 32bit wav support #406

Normalize audio against clipping and 32bit wav support #406

Uh oh!

Conversation

lutzkirschner64-dot commented Feb 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Enhancements:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

ChuxiJ commented Feb 10, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

ChuxiJ commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lutzkirschner64-dot commented Feb 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 10, 2026 •

edited

Loading