自定义音色后生成的wav文件是空的，请教下大佬们 #893

heason8 · 2025-02-13T08:11:12Z

我的代码：

import ChatTTS
import torch
import scipy
from typing import Optional
import torchaudio
from tools.audio import load_audio

chat = ChatTTS.Chat()
chat.load(compile=False)


def on_upload_sample(sample_audio_input: Optional[str]) -> str:
    if sample_audio_input is None:
        return ""
    sample_audio = torch.tensor(load_audio(sample_audio_input, 24000)).to('cpu')
    spk_smp = chat.sample_audio_speaker(sample_audio)
    del sample_audio
    return spk_smp


spk_smb = on_upload_sample("./output.wav")

texts = ["舞台上的光线巨大而苍白，仿佛白色的巨人向他压来。他屏住呼吸，等待着考官公布成绩。"]

params_refine_text = ChatTTS.Chat.RefineTextParams(
    prompt='[oral_2][laugh_0][break_6]',
)

reftext = chat.infer(texts[0], refine_text_only=True)

params_infer_code = ChatTTS.Chat.InferCodeParams(
    txt_smp=reftext,
    spk_smp=spk_smb,
    top_P=0.7,
    top_K=20
)
wavs = chat.infer(texts, params_infer_code=params_infer_code, params_refine_text=params_refine_text)

for i in range(len(wavs)):
    """
    In some versions of torchaudio, the first line works but in other versions, so does the second line.
    """
    try:
        torchaudio.save(f"out_{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)
    except:
        torchaudio.save(f"out_{i}.wav", torch.from_numpy(wavs[i]), 24000)

这个合成的wav文件是空的，并且在运行过程中会报以下内容，没找到原因：

text:   0%|▏                                                                        | 1/384(max) [00:00,  6.77it/s]
We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in
v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_c
ache#legacy-cache-format)
text:   8%|█████▍                                                                  | 29/384(max) [00:00, 63.18it/s]
text:  11%|████████▎                                                               | 44/384(max) [00:00, 99.32it/s]
code:   0%|                                                                            | 0/2048(max) [00:00, ?it/s]
unexpected end at index [0]

regenerate in order to ensure non-empty
code:   0%|▏                                                                       | 4/2048(max) [00:00, 47.68it/s]

The text was updated successfully, but these errors were encountered:

fumiama · 2025-02-13T15:06:40Z

作为smp的语音必须有很高的质量，不能太长，且txt_smp要完全与其对应。

NiceDay608088 · 2025-02-18T02:14:33Z

@fumiama

“必须有很高的质量”，高质量的标准是什么？
“不能太长”，不能超过多久？

fumiama · 2025-02-18T02:44:04Z

你随便试几个就知道了，长度10秒就行。百分之百质量的标准就是直接把chat tts生成的语音导入回去。

fumiama added the documentation Improvements or additions to documentation label Feb 13, 2025

github-actions bot added the stale The topic has been ignored for a long time label Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

自定义音色后生成的wav文件是空的，请教下大佬们 #893

自定义音色后生成的wav文件是空的，请教下大佬们 #893

heason8 commented Feb 13, 2025 •

edited by fumiama

Loading

fumiama commented Feb 13, 2025

NiceDay608088 commented Feb 18, 2025

fumiama commented Feb 18, 2025

自定义音色后生成的wav文件是空的，请教下大佬们 #893

自定义音色后生成的wav文件是空的，请教下大佬们 #893

Comments

heason8 commented Feb 13, 2025 • edited by fumiama Loading

fumiama commented Feb 13, 2025

NiceDay608088 commented Feb 18, 2025

fumiama commented Feb 18, 2025

heason8 commented Feb 13, 2025 •

edited by fumiama

Loading