-
Notifications
You must be signed in to change notification settings - Fork 562
Normalize audio against clipping and 32bit wav support #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize audio against clipping and 32bit wav support #406
Conversation
acestep/audio_utils.py (Includes the new `normalize_audio` function and the 32-bit memory fix) acestep/inference.py (Calls normalization)
acestep/gradio_ui/interfaces/generation.py(Checkbox for normalization & 32-bit selection)
acestep/gradio_ui/events/results_handlers.py (Most important fix for the "FileNotFoundError")
acestep/gradio_ui/i18n/en.json (English text for the new features) acestep/gradio_ui/i18n/zh.json (Chinese placeholders) acestep/gradio_ui/i18n/ja.json (Japanese placeholders)
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughAdds peak audio normalization and WAV32 save pathways to audio utilities, migrates audio hashing to SHA‑256, exposes normalization controls in the generation UI, and adds corresponding English/Japanese/Chinese translation keys and minor UI string updates. Changes
Sequence Diagram(s)sequenceDiagram
participant UI as "User / Generation UI"
participant Gen as "Generation Section\n(create_generation_section)"
participant Audio as "acestep.audio_utils\n(normalize_audio, AudioSaver)"
participant FS as "Filesystem / Soundfile\n(encoding / save backend)"
UI->>Gen: toggle normalization, submit generation
Gen->>Audio: send audio buffers + normalization params
Audio->>Audio: normalize_audio(audio, target_db)
Audio->>Audio: choose save format/path (wav32 or other)
Audio->>FS: attempt primary save (wav32 path / ffmpeg)
alt primary save fails
Audio->>FS: fallback save using soundfile (32-bit WAV or other)
end
Audio->>Audio: compute SHA-256 hash / uuid from audio data
Audio->>Gen: return saved file path + hash/uuid
Gen->>UI: display results (file link, hash, generation details)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🤖 Fix all issues with AI agents
In `@acestep/audio_utils.py`:
- Around line 112-128: The numpy-to-torch conversion in audio_utils.py inverts
the channels_first logic: when channels_first is True the code should treat
audio_data as already [channels, samples] and not transpose; when False it
should convert [samples, channels] to [channels, samples] only if needed. Update
the numpy branch handling of audio_data and audio_tensor so it matches the torch
branch: for numpy, if channels_first is True convert directly with
torch.from_numpy(audio_data).float() (no .T), and if channels_first is False
convert with torch.from_numpy(audio_data).float() and transpose to [channels,
samples] only when audio_tensor.dim() == 2 and audio_tensor.shape[0] <
audio_tensor.shape[1]; keep the torch branch logic (audio_data.cpu().float() and
conditional transpose) unchanged.
- Around line 186-195: The fallback except block captures an unused exception
variable and incorrectly passes "wav32" as a soundfile format; fix by changing
the outer except to use a throwaway name (e.g., except Exception:) or reference
the variable in the log, and update the soundfile fallback to map format "wav32"
to format="WAV" with subtype="PCM_32" (for other formats set format_upper =
format.upper() and subtype=None), then call sf.write(str(output_path), audio_np,
sample_rate, format=format_upper, subtype=subtype) and log the error
appropriately using logger.error(e) when the inner fallback fails; refer to the
symbols audio_tensor, output_path, sample_rate, format, logger and the fallback
block in audio_utils.py.
In `@acestep/gradio_ui/i18n/ja.json`:
- Around line 247-252: The new localization keys under "gen"
(enable_normalization, enable_normalization_info, normalization_db,
normalization_db_info) are still in English; translate those four strings into
Japanese and replace the English values in the ja.json file so the UI shows
Japanese labels and help text for audio normalization; keep the JSON keys
unchanged and only update their string values with accurate Japanese
translations that match the tone and length of existing entries.
In `@acestep/gradio_ui/i18n/zh.json`:
- Around line 247-252: The JSON keys under "gen" ("enable_normalization",
"enable_normalization_info", "normalization_db", "normalization_db_info") are
still in English; replace their English values with Chinese translations so the
zh locale shows localized labels and help text (translate the label "Enable
Normalization", the info text about normalizing audio volume, "Target Peak (dB)"
and the info about dB recommendations). Keep the keys intact and only update the
string values for those four keys.
In `@acestep/gradio_ui/interfaces/generation.py`:
- Line 307: The UI references a missing i18n key generation.advanced_dit_params
inside the gr.Accordion call (look for the
gr.Accordion(t("generation.advanced_dit_params"), open=False) usage) causing the
raw key to display; add the generation.advanced_dit_params entry to all locale
files (en/zh/ja) with appropriate translations or replace the reference with an
existing i18n key so the accordion label shows a proper localized string.
- Around line 433-446: Comments near the gr.Dropdown creation for vocal_language
use fullwidth punctuation characters (e.g., ':', ',') which triggers Ruff
RUF003; update the inline comments around the gr.Dropdown block (the lines
describing "中间:语言选择 (Dropdown)" and "右侧:格式化按钮 (Button)" and any other nearby
comments) to use ASCII punctuation (':' and ',') or rewrite the comments to
avoid fullwidth characters, keeping references to the gr.Dropdown instantiation
(vocal_language) and its parameters (choices=VALID_LANGUAGES, value="unknown",
label=t("generation.vocal_language_label"), allow_custom_value=True, scale=3)
intact so the code logic is unchanged.
- Around line 302-305: The checkbox and slider are using hardcoded defaults
instead of honoring pre-initialized settings; update the gr.Checkbox
initialization for enable_normalization and the gr.Slider for normalization_db
to pull values from init_params (e.g., use
init_params.get("enable_normalization", True) for enable_normalization and
init_params.get("normalization_db", -1.0) for normalization_db) so
pre-initialized runs show the correct UI state; ensure you coerce types as
needed (bool for enable_normalization, float for normalization_db) and leave the
info/label parameters unchanged.
- Around line 63-65: The visibility logic uses service_pre_initialized but
should hide the accordion when the app is running in service mode; change
accordion_visible to depend on the service-mode flag (e.g., set
accordion_visible = not service_mode or not is_service_mode) instead of
service_pre_initialized, ensuring the boolean used (service_mode /
is_service_mode) exists in scope and preserves accordion_open = not
service_pre_initialized for the open state.
🧹 Nitpick comments (4)
acestep/audio_utils.py (4)
69-79: Update docstring to includewav32format.The docstring mentions only
'flac', 'wav', 'mp3'but the code also accepts'wav32'.📝 Proposed docstring fix
""" Initialize audio saver Args: - default_format: Default save format ('flac', 'wav', 'mp3') + default_format: Default save format ('flac', 'wav', 'mp3', 'wav32') """
107-110: Simplifywav32extension handling.The current logic sets
.wav32suffix at line 110, then changes it back to.wavat line 152. Consider handling this mapping earlier for clarity.♻️ Proposed simplification
# Ensure output path has correct extension output_path = Path(output_path) - if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']: - output_path = output_path.with_suffix(f'.{format}') + ext = ".wav" if format == "wav32" else f".{format}" + if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']: + output_path = output_path.with_suffix(ext)
174-181: Unreachable code block.The
elsebranch is unreachable becauseformatis validated to be one of["flac", "wav", "mp3", "wav32"]at line 103, and all these formats are handled in the precedingif/elifbranches.🧹 Remove dead code
backend='soundfile', ) - else: - # Other formats use default backend - torchaudio.save( - str(output_path), - audio_tensor, - sample_rate, - channels_first=True, - )
291-314: Consider using SHA-256 instead of MD5.While MD5 is sufficient for caching/deduplication purposes (not cryptographic), using SHA-256 would be more consistent with
generate_uuid_from_params(which already uses SHA-256) and would silence the static analysis warnings (S324).♻️ Proposed change
def get_audio_file_hash(audio_file) -> str: ... try: if isinstance(audio_file, str): if os.path.exists(audio_file): with open(audio_file, 'rb') as f: - return hashlib.md5(f.read()).hexdigest() - return hashlib.md5(audio_file.encode('utf-8')).hexdigest() + return hashlib.sha256(f.read()).hexdigest() + return hashlib.sha256(audio_file.encode('utf-8')).hexdigest() elif hasattr(audio_file, 'name'): - return hashlib.md5(str(audio_file.name).encode('utf-8')).hexdigest() - return hashlib.md5(str(audio_file).encode('utf-8')).hexdigest() + return hashlib.sha256(str(audio_file.name).encode('utf-8')).hexdigest() + return hashlib.sha256(str(audio_file).encode('utf-8')).hexdigest() except Exception: - return hashlib.md5(str(audio_file).encode('utf-8')).hexdigest() + return hashlib.sha256(str(audio_file).encode('utf-8')).hexdigest()
| # Convert to torch tensor | ||
| if isinstance(audio_data, np.ndarray): | ||
| if channels_first: | ||
| # numpy [samples, channels] -> tensor [channels, samples] | ||
| audio_tensor = torch.from_numpy(audio_data.T).float() | ||
| else: | ||
| # numpy [samples, channels] -> tensor [samples, channels] -> [channels, samples] | ||
| audio_tensor = torch.from_numpy(audio_data).float() | ||
| if audio_tensor.dim() == 2 and audio_tensor.shape[0] < audio_tensor.shape[1]: | ||
| audio_tensor = audio_tensor.T | ||
| else: | ||
| # torch tensor | ||
| audio_tensor = audio_data.cpu().float() | ||
| if not channels_first and audio_tensor.dim() == 2: | ||
| # [samples, channels] -> [channels, samples] | ||
| if audio_tensor.shape[0] > audio_tensor.shape[1]: | ||
| audio_tensor = audio_tensor.T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The channels_first logic for numpy arrays appears inverted.
When channels_first=True, the input is expected to already be in [channels, samples] format. However, the code transposes it (line 116), which would incorrectly convert it to [samples, channels]. The torch tensor branch (lines 125-128) handles this correctly by only transposing when channels_first=False.
🐛 Proposed fix
# Convert to torch tensor
if isinstance(audio_data, np.ndarray):
- if channels_first:
- # numpy [samples, channels] -> tensor [channels, samples]
- audio_tensor = torch.from_numpy(audio_data.T).float()
- else:
- # numpy [samples, channels] -> tensor [samples, channels] -> [channels, samples]
+ if not channels_first:
+ # numpy [samples, channels] -> tensor [channels, samples]
audio_tensor = torch.from_numpy(audio_data).float()
if audio_tensor.dim() == 2 and audio_tensor.shape[0] < audio_tensor.shape[1]:
audio_tensor = audio_tensor.T
+ else:
+ # numpy already [channels, samples]
+ audio_tensor = torch.from_numpy(audio_data).float()🤖 Prompt for AI Agents
In `@acestep/audio_utils.py` around lines 112 - 128, The numpy-to-torch conversion
in audio_utils.py inverts the channels_first logic: when channels_first is True
the code should treat audio_data as already [channels, samples] and not
transpose; when False it should convert [samples, channels] to [channels,
samples] only if needed. Update the numpy branch handling of audio_data and
audio_tensor so it matches the torch branch: for numpy, if channels_first is
True convert directly with torch.from_numpy(audio_data).float() (no .T), and if
channels_first is False convert with torch.from_numpy(audio_data).float() and
transpose to [channels, samples] only when audio_tensor.dim() == 2 and
audio_tensor.shape[0] < audio_tensor.shape[1]; keep the torch branch logic
(audio_data.cpu().float() and conditional transpose) unchanged.
| except Exception as e: | ||
| try: | ||
| import soundfile as sf | ||
| audio_np = audio_tensor.transpose(0, 1).numpy() # -> [samples, channels] | ||
| sf.write(str(output_path), audio_np, sample_rate, format=format.upper()) | ||
| logger.debug(f"[AudioSaver] Fallback soundfile Saved audio to {output_path} ({format}, {sample_rate}Hz)") | ||
| return str(output_path) | ||
| except Exception as e: | ||
| logger.error(f"[AudioSaver] Failed to save audio: {e}") | ||
| raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix unused variable and broken wav32 fallback.
Two issues:
- Line 186: Variable
eis captured but never used (per static analysis). - Line 190: If
formatis"wav32",format.upper()becomes"WAV32", which is not a valid soundfile format. The fallback will fail for wav32.
🐛 Proposed fix
- except Exception as e:
+ except Exception:
try:
import soundfile as sf
audio_np = audio_tensor.transpose(0, 1).numpy() # -> [samples, channels]
- sf.write(str(output_path), audio_np, sample_rate, format=format.upper())
+ sf_format = "WAV" if format == "wav32" else format.upper()
+ sf.write(str(output_path), audio_np, sample_rate, format=sf_format)
logger.debug(f"[AudioSaver] Fallback soundfile Saved audio to {output_path} ({format}, {sample_rate}Hz)")
return str(output_path)
- except Exception as e:
+ except Exception as inner_e:
- logger.error(f"[AudioSaver] Failed to save audio: {e}")
+ logger.error(f"[AudioSaver] Failed to save audio: {inner_e}")
raise🧰 Tools
🪛 Ruff (0.14.14)
[warning] 186-186: Do not catch blind exception: Exception
(BLE001)
[error] 186-186: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
🤖 Prompt for AI Agents
In `@acestep/audio_utils.py` around lines 186 - 195, The fallback except block
captures an unused exception variable and incorrectly passes "wav32" as a
soundfile format; fix by changing the outer except to use a throwaway name
(e.g., except Exception:) or reference the variable in the log, and update the
soundfile fallback to map format "wav32" to format="WAV" with subtype="PCM_32"
(for other formats set format_upper = format.upper() and subtype=None), then
call sf.write(str(output_path), audio_np, sample_rate, format=format_upper,
subtype=subtype) and log the error appropriately using logger.error(e) when the
inner fallback fails; refer to the symbols audio_tensor, output_path,
sample_rate, format, logger and the fallback block in audio_utils.py.
| normalization_db = gr.Slider(label=t("gen.normalization_db"), minimum=-10.0, maximum=0.0, step=0.1, value=-1.0, info=t("gen.normalization_db_info")) | ||
|
|
||
| # Advanced DiT Parameters | ||
| with gr.Accordion(t("generation.advanced_dit_params"), open=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the missing i18n key for generation.advanced_dit_params.
Line 307 references a key that doesn’t exist in en/zh/ja, so the UI will show the raw key. Add the key to all locales or reuse an existing label.
🤖 Prompt for AI Agents
In `@acestep/gradio_ui/interfaces/generation.py` at line 307, The UI references a
missing i18n key generation.advanced_dit_params inside the gr.Accordion call
(look for the gr.Accordion(t("generation.advanced_dit_params"), open=False)
usage) causing the raw key to display; add the generation.advanced_dit_params
entry to all locale files (en/zh/ja) with appropriate translations or replace
the reference with an existing i18n key so the accordion label shows a proper
localized string.
| # 中间:语言选择 (Dropdown) | ||
| # 移除 gr.HTML hack,直接使用 label 参数,Gradio 会自动处理对齐 | ||
| vocal_language = gr.Dropdown( | ||
| choices=VALID_LANGUAGES, | ||
| value="unknown", | ||
| label=t("generation.vocal_language_label"), | ||
| show_label=False, | ||
| container=True, | ||
| allow_custom_value=True, | ||
| scale=3, | ||
| ) | ||
|
|
||
| # 右侧:格式化按钮 (Button) | ||
| # 放在同一行最右侧,操作更顺手 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve Ruff RUF003 warnings in comments (fullwidth punctuation).
Lines 433-446 use fullwidth : and ,, which Ruff flags.
🔧 Suggested cleanup
- # 中间:语言选择 (Dropdown)
- # 移除 gr.HTML hack,直接使用 label 参数,Gradio 会自动处理对齐
+ # 中间: 语言选择 (Dropdown)
+ # 移除 gr.HTML hack, 直接使用 label 参数, Gradio 会自动处理对齐
@@
- # 右侧:格式化按钮 (Button)
- # 放在同一行最右侧,操作更顺手
+ # 右侧: 格式化按钮 (Button)
+ # 放在同一行最右侧, 操作更顺手📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # 中间:语言选择 (Dropdown) | |
| # 移除 gr.HTML hack,直接使用 label 参数,Gradio 会自动处理对齐 | |
| vocal_language = gr.Dropdown( | |
| choices=VALID_LANGUAGES, | |
| value="unknown", | |
| label=t("generation.vocal_language_label"), | |
| show_label=False, | |
| container=True, | |
| allow_custom_value=True, | |
| scale=3, | |
| ) | |
| # 右侧:格式化按钮 (Button) | |
| # 放在同一行最右侧,操作更顺手 | |
| # 中间: 语言选择 (Dropdown) | |
| # 移除 gr.HTML hack, 直接使用 label 参数, Gradio 会自动处理对齐 | |
| vocal_language = gr.Dropdown( | |
| choices=VALID_LANGUAGES, | |
| value="unknown", | |
| label=t("generation.vocal_language_label"), | |
| show_label=False, | |
| container=True, | |
| allow_custom_value=True, | |
| scale=3, | |
| ) | |
| # 右侧: 格式化按钮 (Button) | |
| # 放在同一行最右侧, 操作更顺手 |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 433-433: Comment contains ambiguous : (FULLWIDTH COLON). Did you mean : (COLON)?
(RUF003)
[warning] 434-434: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
[warning] 434-434: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
[warning] 445-445: Comment contains ambiguous : (FULLWIDTH COLON). Did you mean : (COLON)?
(RUF003)
[warning] 446-446: Comment contains ambiguous , (FULLWIDTH COMMA). Did you mean , (COMMA)?
(RUF003)
🤖 Prompt for AI Agents
In `@acestep/gradio_ui/interfaces/generation.py` around lines 433 - 446, Comments
near the gr.Dropdown creation for vocal_language use fullwidth punctuation
characters (e.g., ':', ',') which triggers Ruff RUF003; update the inline
comments around the gr.Dropdown block (the lines describing "中间:语言选择 (Dropdown)"
and "右侧:格式化按钮 (Button)" and any other nearby comments) to use ASCII punctuation
(':' and ',') or rewrite the comments to avoid fullwidth characters, keeping
references to the gr.Dropdown instantiation (vocal_language) and its parameters
(choices=VALID_LANGUAGES, value="unknown",
label=t("generation.vocal_language_label"), allow_custom_value=True, scale=3)
intact so the code logic is unchanged.
|
You’ve modified more than 5,000 lines of code for this small change, which makes testing and review very difficult. |
Includes the fix for the NumPy problem and improved memory logic.
Includes the restored variable text2music_audio_codes_group and the new checkbox for normalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@acestep/audio_utils.py`:
- Around line 210-252: The convert_audio docstring is missing the newly
supported 'wav32' output format; update the docstring for convert_audio to list
'wav32' among supported formats (e.g., "flac, wav, wav32, mp3") and ensure any
mention of output_format in the function signature/comments matches this
addition; check save_audio usage and related docs/comments to keep them
consistent with the new 'wav32' option.
- Around line 107-118: When format == "wav32" we must force the output_path
extension to ".wav" regardless of any existing suffix; modify the current
conditional so the first branch checks if format == "wav32" and calls
output_path = output_path.with_suffix(".wav"), then otherwise fall through to
the existing validation (e.g., if output_path.suffix.lower() not in ['.flac',
'.wav', '.mp3'] then output_path = output_path.with_suffix(ext)). Update the
logic surrounding output_path, ext and the special-case handling (remove the
separate ".wav32" elif) so wav32 always becomes .wav while other formats still
respect or correct unknown extensions.
| # Ensure output path has correct extension | ||
| output_path = Path(output_path) | ||
|
|
||
| # Determine extension based on format | ||
| ext = ".wav" if format == "wav32" else f".{format}" | ||
|
|
||
| if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']: | ||
| output_path = output_path.with_suffix(ext) | ||
| elif format == "wav32" and output_path.suffix.lower() == ".wav32": | ||
| # Explicitly fix .wav32 extension if present | ||
| output_path = output_path.with_suffix(".wav") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force .wav extension for wav32 even when a different suffix is supplied.
Right now, format="wav32" with an existing .flac/.mp3 suffix won’t be corrected, leading to a mismatched extension.
🐛 Proposed fix
- if output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']:
- output_path = output_path.with_suffix(ext)
- elif format == "wav32" and output_path.suffix.lower() == ".wav32":
- # Explicitly fix .wav32 extension if present
- output_path = output_path.with_suffix(".wav")
+ if format == "wav32":
+ if output_path.suffix.lower() != ".wav":
+ # Explicitly use .wav extension for wav32
+ output_path = output_path.with_suffix(".wav")
+ elif output_path.suffix.lower() not in ['.flac', '.wav', '.mp3']:
+ output_path = output_path.with_suffix(ext)🤖 Prompt for AI Agents
In `@acestep/audio_utils.py` around lines 107 - 118, When format == "wav32" we
must force the output_path extension to ".wav" regardless of any existing
suffix; modify the current conditional so the first branch checks if format ==
"wav32" and calls output_path = output_path.with_suffix(".wav"), then otherwise
fall through to the existing validation (e.g., if output_path.suffix.lower() not
in ['.flac', '.wav', '.mp3'] then output_path = output_path.with_suffix(ext)).
Update the logic surrounding output_path, ext and the special-case handling
(remove the separate ".wav32" elif) so wav32 always becomes .wav while other
formats still respect or correct unknown extensions.
| def convert_audio( | ||
| self, | ||
| input_path: Union[str, Path], | ||
| output_path: Union[str, Path], | ||
| output_format: str, | ||
| remove_input: bool = False, | ||
| ) -> str: | ||
| """ | ||
| Convert audio format | ||
|
|
||
| Args: | ||
| input_path: Input audio file path | ||
| output_path: Output audio file path | ||
| output_format: Target format ('flac', 'wav', 'mp3') | ||
| remove_input: Whether to delete input file | ||
|
|
||
| Returns: | ||
| Output file path | ||
| """ | ||
| input_path = Path(input_path) | ||
| output_path = Path(output_path) | ||
|
|
||
| if not input_path.exists(): | ||
| raise FileNotFoundError(f"Input file not found: {input_path}") | ||
|
|
||
| # Load audio | ||
| audio_tensor, sample_rate = torchaudio.load(str(input_path)) | ||
|
|
||
| # Save as new format | ||
| output_path = self.save_audio( | ||
| audio_tensor, | ||
| output_path, | ||
| sample_rate=sample_rate, | ||
| format=output_format, | ||
| channels_first=True | ||
| ) | ||
|
|
||
| # Delete input file if needed | ||
| if remove_input: | ||
| input_path.unlink() | ||
| logger.debug(f"[AudioSaver] Removed input file: {input_path}") | ||
|
|
||
| return output_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring should include wav32 as a supported output format.
Minor doc mismatch with the newly supported option.
📝 Proposed fix
- output_format: Target format ('flac', 'wav', 'mp3')
+ output_format: Target format ('flac', 'wav', 'mp3', 'wav32')📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def convert_audio( | |
| self, | |
| input_path: Union[str, Path], | |
| output_path: Union[str, Path], | |
| output_format: str, | |
| remove_input: bool = False, | |
| ) -> str: | |
| """ | |
| Convert audio format | |
| Args: | |
| input_path: Input audio file path | |
| output_path: Output audio file path | |
| output_format: Target format ('flac', 'wav', 'mp3') | |
| remove_input: Whether to delete input file | |
| Returns: | |
| Output file path | |
| """ | |
| input_path = Path(input_path) | |
| output_path = Path(output_path) | |
| if not input_path.exists(): | |
| raise FileNotFoundError(f"Input file not found: {input_path}") | |
| # Load audio | |
| audio_tensor, sample_rate = torchaudio.load(str(input_path)) | |
| # Save as new format | |
| output_path = self.save_audio( | |
| audio_tensor, | |
| output_path, | |
| sample_rate=sample_rate, | |
| format=output_format, | |
| channels_first=True | |
| ) | |
| # Delete input file if needed | |
| if remove_input: | |
| input_path.unlink() | |
| logger.debug(f"[AudioSaver] Removed input file: {input_path}") | |
| return output_path | |
| def convert_audio( | |
| self, | |
| input_path: Union[str, Path], | |
| output_path: Union[str, Path], | |
| output_format: str, | |
| remove_input: bool = False, | |
| ) -> str: | |
| """ | |
| Convert audio format | |
| Args: | |
| input_path: Input audio file path | |
| output_path: Output audio file path | |
| output_format: Target format ('flac', 'wav', 'mp3', 'wav32') | |
| remove_input: Whether to delete input file | |
| Returns: | |
| Output file path | |
| """ | |
| input_path = Path(input_path) | |
| output_path = Path(output_path) | |
| if not input_path.exists(): | |
| raise FileNotFoundError(f"Input file not found: {input_path}") | |
| # Load audio | |
| audio_tensor, sample_rate = torchaudio.load(str(input_path)) | |
| # Save as new format | |
| output_path = self.save_audio( | |
| audio_tensor, | |
| output_path, | |
| sample_rate=sample_rate, | |
| format=output_format, | |
| channels_first=True | |
| ) | |
| # Delete input file if needed | |
| if remove_input: | |
| input_path.unlink() | |
| logger.debug(f"[AudioSaver] Removed input file: {input_path}") | |
| return output_path |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 233-233: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In `@acestep/audio_utils.py` around lines 210 - 252, The convert_audio docstring
is missing the newly supported 'wav32' output format; update the docstring for
convert_audio to list 'wav32' among supported formats (e.g., "flac, wav, wav32,
mp3") and ensure any mention of output_format in the function signature/comments
matches this addition; check save_audio usage and related docs/comments to keep
them consistent with the new 'wav32' option.
(These three contain the missing translation texts for the new features)
|
please solve conflicts |
This update addresses critical audio quality issues. Previously, generated songs often suffered from severe clipping and lacked headroom (generated at 0.0 dBFS True Peak). These changes implement professional audio standards to ensure a clean workflow for musicians and engineers.
Key Enhancements:
CHANGES_SUMMARY.md
Summary by CodeRabbit
New Features
Bug Fixes
Localization & UI Updates