PLEASE NOTE THIS IS A WORK IN PROGRESS. I HAVENT FIGURED OUT HOW TO TUNE THE MODEL FULLY YET :D
This project consists of three Python scripts that work together to process audio files, train a model using the processed data, and generate new songs based on the trained model.
This script processes audio files in the specified directory, converts them to mel-spectrograms, and saves the spectrograms as JPEG images. It also creates a TFRecord file containing the processed audio data.
This script trains an LSTM model using the data stored in the TFRecord file created by create_tfrecord.py
. The model is saved to the specified directory.
This script generates new songs based on the trained LSTM model. The generated songs are saved as WAV files in the specified directory.
- Run
create_tfrecord.py
to process the audio files and create the TFRecord file. - Run
train_model.py
to train the LSTM model using the data from the TFRecord file. - Run
generate_songs.py
to generate new songs based on the trained model.
- TensorFlow
- Keras
- Librosa
- SciPy
- Matplotlib
- Google Cloud Storage (optional, for saving and loading data to/from Google Cloud Storage)
- You may need to modify the script parameters to match your specific use case, such as input and output directories, audio file format, sample rate, and the number of generated songs.
- For training and generating songs, you can choose to use TPU, GPU, or CPU by adjusting the
use_tpu
parameter intrain_model.py
andgenerate_songs.py
. - You can also choose to read and write data from/to local storage or Google Cloud Storage by adjusting the
read_local
andwrite_local
parameters in the respective scripts.
This script processes the audio files located in the "audio" directory. It loads each audio file, computes its Short-Time Fourier Transform (STFT), and converts it into a Mel-spectrogram. Then, it creates input-target pairs by taking a chunk of the Mel-spectrogram and its shifted version. The Mel-spectrogram is saved as a JPEG image in the "spectrograms" directory. Finally, it writes the input-target pairs to a TFRecord file, which is either saved locally or uploaded to Google Cloud Storage.
This script reads the created TFRecord file, parses the input-target pairs, and visualizes them as images in the "tfrecord_check" directory. It helps verify that the data has been correctly processed and stored in the TFRecord file.
This script trains a deep learning model using the data stored in the TFRecord file. The model consists of several layers, including bidirectional LSTMs, layer normalizations, and an attention layer. The model is compiled with the Adam optimizer and the mean squared error loss function. The training and validation datasets are prepared using the input-target pairs from the TFRecord file. The model is trained on a TPU or GPU, and the weights are saved after each epoch. The final model is saved either locally or on Google Cloud Storage.
This script generates new audio samples based on the trained model. It first loads the model from local storage or Google Cloud Storage, depending on the specified parameters. It then generates new Mel-spectrograms by feeding the model with an initial seed and extending the prediction iteratively. The generated Mel-spectrogram is then converted back to an audio signal using an inverse STFT. The generated audio samples are saved as .wav files either locally or on Google Cloud Storage.