Voice cloning The code is has been finetuned by using a pre-trained model. The inference can be done by using inference.ipynb file. Load the finetuned model and speaker embeddings. Run the code line by line. The model has been fed with the maximum 2 seconds, So the voice can be generated of maximum 2 seconds. Although it can generate long texts but the data has to be similar for training. The data has been trained to speak names only.So, it can take a maximum of 2 words. The generated audio and orginal audio along with checkpoints can be found in the drive link below. https://drive.google.com/drive/folders/1EDsjN4UCCdSQLeDV9ZJAjbgVsa3WfwP9?usp=sharing