A sentence for text to speech
The Voice file is output as .wav which path is defined as SAVE_WAV_PATH
in tacotron2.py
.
Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.
For the sample sentence,
python3 tacotron2.py
If you want to specify the input sentence, put the wav path after the --input option. You can use --savepath option to change the name of the output file to save.
python3 tacotron2.py --input "Hello world." --savepath SAVE_WAV_PATH
There are two models that can generate speach from mel spectograms in English. The defoult is nvidia model, which uses waveglow for conversion. By choosing hifi option you can use HIFI GAN for speach generation.
python3 tacotron2.py -m hifi
Recognizing Japanese requires converting the text into phonemes. Conversion to phonemes requires openjtalk.
# for macOS, Linux
pip3 install pyopenjtalk
# for Windows
pip3 install pyopenjtalk-prebuilt
Run.
python3 tacotron2.py -i "こんにちは。" -m tsukuyomi
Tacotron2
ONNX Export
[HIFI GAN] (https://github.com/jik876/hifi-gan/tree/master)
PyTorch
ONNX opset = 11, 12