You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently training a model for Japanese language processing. To improve audio quality, I have raised the sampling rate to 48,000. Next, when I compared the YAML configurations for the 40s and 75s models, I found that the differences were in the downsamples, n_fft, and hop_length parameters.
I would like to ask: Is increasing the number of tokens the most straightforward way to enhance the quality of the generated audio? Also, to raise the sampling rate to 48,000, are there any other parameters, aside from the sampling rate itself, that I need to adjust?
I am currently training a model for Japanese language processing. To improve audio quality, I have raised the sampling rate to 48,000. Next, when I compared the YAML configurations for the 40s and 75s models, I found that the differences were in the downsamples, n_fft, and hop_length parameters.
I would like to ask: Is increasing the number of tokens the most straightforward way to enhance the quality of the generated audio? Also, to raise the sampling rate to 48,000, are there any other parameters, aside from the sampling rate itself, that I need to adjust?
The following is the code I am using. train.py
The most effective approach to enhancing audio quality is through multi-layer quantization, although this choice presents both advantages and disadvantages. Additionally, there exist several other engineering techniques that can further enhance the quality of Wavtokenizer. Updates incorporating these advancements will be implemented in subsequent versions.
When adjusting the sampling rate, attention must be paid to the configuration parameters and the components related to the discriminator.
Thank you for your response to my question. I’m also excited to hear that a subsequent version will be released! When is the release of the next version expected?
I am currently training a model for Japanese language processing. To improve audio quality, I have raised the sampling rate to 48,000. Next, when I compared the YAML configurations for the 40s and 75s models, I found that the differences were in the downsamples, n_fft, and hop_length parameters.
I would like to ask: Is increasing the number of tokens the most straightforward way to enhance the quality of the generated audio? Also, to raise the sampling rate to 48,000, are there any other parameters, aside from the sampling rate itself, that I need to adjust?
The following is the code I am using.
train.py
EncodecFeatures
The text was updated successfully, but these errors were encountered: