This project explores music source separation for vocal isolation using Conv-TasNet and SepFormer built with the SpeechBrain toolkit.
The repository contains a notebook-driven course project and a refactored training layout for publishing on GitHub. The workflow covers:
- MUSDB18 data preparation
- metadata CSV creation
- model training with SpeechBrain
- evaluation and qualitative listening
.
├── configs/
│ ├── convtasnet.yaml
│ ├── sepformer.yaml
│ └── *_original.yaml
├── notebooks/
│ └── Project.ipynb
├── scripts/
│ └── prepare_musdb.py
├── src/
│ ├── train.py
│ └── train_original.py
├── data/
│ ├── raw/
│ └── processed/
├── results/
├── requirements.txt
└── README.md
pip install -r requirements.txt- Download the MUSDB18 dataset.
- Place the raw dataset under
data/raw/. - Generate
musdb_train.csv,musdb_valid.csv, andmusdb_test.csvfrom the preprocessing logic innotebooks/Project.ipynb.
Starter command:
python scripts/prepare_musdb.pyTrain Conv-TasNet:
python src/train.py configs/convtasnet.yamlTrain SepFormer:
python src/train.py configs/sepformer.yaml- The original uploaded files are preserved as
*_original.yamlandtrain_original.py. - The refactored files are intended to make the repository easier to read and maintain.
- You may still need to adapt dataset loading hooks depending on how your CSV manifests are generated.
This project builds on the SpeechBrain toolkit for speech and audio processing.
SpeechBrain repository:
- SpeechBrain GitHub: https://github.com/speechbrain/speechbrain
Please cite SpeechBrain if you use this toolkit:
@article{speechbrain2021,
title={SpeechBrain: A General-Purpose Speech Toolkit},
author={Ravanelli, Mirco and others},
journal={arXiv preprint arXiv:2106.04624},
year={2021}
}