Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to train the model with Token/s about 23, that is hopsize=1024 #35

Open
Liujingxiu23 opened this issue Sep 21, 2024 · 5 comments
Open

Comments

@Liujingxiu23
Copy link

I try to train the model with hopsize=1024, shout 23 tokens per second, I only change the upsample_rates to [8,8,4,4] and num_samples to 71680. The trainning is running now, but the results seems not, the synthesized wave is not intelligent, not very good.
What is a good config?

@jishengpeng
Copy link
Owner

I try to train the model with hopsize=1024, shout 23 tokens per second, I only change the upsample_rates to [8,8,4,4] and num_samples to 71680. The trainning is running now, but the results seems not, the synthesized wave is not intelligent, not very good. What is a good config?

There are three key considerations to note:

1.The downsampling process should adhere to sampling rate constraints.

2.When modifying the downsampling rate, corresponding adjustments to the hop length and n_fft parameters should be made accordingly.

3.If minimized the number of tokens is your objective, I recommend utilizing audio with a sampling rate of 16 kHz

@Liujingxiu23
Copy link
Author

Liujingxiu23 commented Sep 21, 2024

Thank you for your reply!
For the third point, yes I just what to minimized the number of tokens to reduce the computation of the LLM part. You mean "sample_rate=16000 hopsize=600" may be a better choice?

@jishengpeng
Copy link
Owner

Thank you for your reply! For the third point, yes I just what to minimized the number of tokens to reduce the computation of the LLM part. You mean "sample_rate=16000 hopsize=600" may be a better choice?

There are many options, you can try the following configuration, and then adjust the parameters

such as downsamples=[8,5,4,4], sample_rate=16000, hop_length=640, n_fft=2560

@Liujingxiu23
Copy link
Author

Thanks a lot! I will try to train using this config!

@guanw-pku
Copy link

I try to train the model with hopsize=1024, shout 23 tokens per second, I only change the upsample_rates to [8,8,4,4] and num_samples to 71680. The trainning is running now, but the results seems not, the synthesized wave is not intelligent, not very good. What is a good config?

There are three key considerations to note:

1.The downsampling process should adhere to sampling rate constraints.

2.When modifying the downsampling rate, corresponding adjustments to the hop length and n_fft parameters should be made accordingly.

3.If minimized the number of tokens is your objective, I recommend utilizing audio with a sampling rate of 16 kHz

Hi, wondering how to tune the n_fft parameter when adjusting hop_length.
What is the connection between them?
Looking for your reply. Thanks so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants