Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

自定义音色后生成的wav文件是空的,请教下大佬们 #893

Open
heason8 opened this issue Feb 13, 2025 · 3 comments
Open

自定义音色后生成的wav文件是空的,请教下大佬们 #893

heason8 opened this issue Feb 13, 2025 · 3 comments
Labels
documentation Improvements or additions to documentation stale The topic has been ignored for a long time

Comments

@heason8
Copy link

heason8 commented Feb 13, 2025

我的代码:

import ChatTTS
import torch
import scipy
from typing import Optional
import torchaudio
from tools.audio import load_audio

chat = ChatTTS.Chat()
chat.load(compile=False)


def on_upload_sample(sample_audio_input: Optional[str]) -> str:
    if sample_audio_input is None:
        return ""
    sample_audio = torch.tensor(load_audio(sample_audio_input, 24000)).to('cpu')
    spk_smp = chat.sample_audio_speaker(sample_audio)
    del sample_audio
    return spk_smp


spk_smb = on_upload_sample("./output.wav")

texts = ["舞台上的光线巨大而苍白,仿佛白色的巨人向他压来。他屏住呼吸,等待着考官公布成绩。"]

params_refine_text = ChatTTS.Chat.RefineTextParams(
    prompt='[oral_2][laugh_0][break_6]',
)

reftext = chat.infer(texts[0], refine_text_only=True)

params_infer_code = ChatTTS.Chat.InferCodeParams(
    txt_smp=reftext,
    spk_smp=spk_smb,
    top_P=0.7,
    top_K=20
)
wavs = chat.infer(texts, params_infer_code=params_infer_code, params_refine_text=params_refine_text)

for i in range(len(wavs)):
    """
    In some versions of torchaudio, the first line works but in other versions, so does the second line.
    """
    try:
        torchaudio.save(f"out_{i}.wav", torch.from_numpy(wavs[i]).unsqueeze(0), 24000)
    except:
        torchaudio.save(f"out_{i}.wav", torch.from_numpy(wavs[i]), 24000)

这个合成的wav文件是空的,并且在运行过程中会报以下内容,没找到原因:

text:   0%|▏                                                                        | 1/384(max) [00:00,  6.77it/s]
We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in
v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_c
ache#legacy-cache-format)
text:   8%|█████▍                                                                  | 29/384(max) [00:00, 63.18it/s]
text:  11%|████████▎                                                               | 44/384(max) [00:00, 99.32it/s]
code:   0%|                                                                            | 0/2048(max) [00:00, ?it/s]
unexpected end at index [0]

regenerate in order to ensure non-empty
code:   0%|▏                                                                       | 4/2048(max) [00:00, 47.68it/s]
@fumiama
Copy link
Member

fumiama commented Feb 13, 2025

作为smp的语音必须有很高的质量,不能太长,且txt_smp要完全与其对应。

@fumiama fumiama added the documentation Improvements or additions to documentation label Feb 13, 2025
@NiceDay608088
Copy link

@fumiama

  • “必须有很高的质量”, 高质量的标准是什么?
  • “不能太长”, 不能超过多久?

@fumiama
Copy link
Member

fumiama commented Feb 18, 2025

你随便试几个就知道了,长度10秒就行。百分之百质量的标准就是直接把chat tts生成的语音导入回去。

@github-actions github-actions bot added the stale The topic has been ignored for a long time label Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation stale The topic has been ignored for a long time
Projects
None yet
Development

No branches or pull requests

3 participants