This is an Official implementation of eCMU: An Efficient Phase-aware Framework for Music Source Separation with Conformer (IEEE RIVF23)
Our implementation was developed based on the sdx23-aimless
framework.
🎼 You can remix songs and enjoy here 📻
From our baseline Open-Unmix (UMX), we:
- Attempt to build an affordable model to solve the music source separation (MSS) task in the spectral domain with limited computing resources.
- Apply a differentiable Multi-channel Wiener Filter (MWF) into a mask-based prediction model to end-to-end estimated the complex spectrogram for each source.
- Optimize the model by using the Multi-domain loss function on the public
MUSDB18-HQ
dataset. - Leverage the ability of Conformer blocks to capture both local and global feature dependencies on time and frequency axis.
- python 3.8+
- pytorch-lightning
- pytorch
pip install -r requirments.txt
Note: eCMU is a single-target model. This means each stem is separated by a specific model. Therefore, there are four single models in total.
- Download model weights here.
- To separate all sources on
gpu
:
python -m core.models.separator \
assets/samples/22_TaylorSwift.mp3 \
--model_ckpt eCMU_checkpoints/small
- To separate all sources on
cpu
:
python -m core.models.separator \
assets/samples/22_TaylorSwift.mp3 \
--model_ckpt eCMU_checkpoints/small \
--no-gpu
- Or, even if you want to separte a subset of stems (i.e: only {
vocals
,drums
}), you can run:
python -m core.models.separator \
assets/samples/22_TaylorSwift.mp3 \
--targets vocals drums \
--model_ckpt eCMU_checkpoints/small
Other audio formats: .wav
, .m4a
, .aac
are also supported.
In case you want to separate audio from youtube url, you can download the audio files first by:
python -m scripts.download <url>
Then, run the above inference commands with the new audio input.
Download MUSDB18-HQ
dataset and uncompress into musdb/
musdb/
|____ train/
|____ test/
Note: Remember to replace the path of data root in .yaml
files before training.
python main.py fit --config cfg/small/vocals.yaml
# python main.py fit --config cfg/small/drums.yaml
# python main.py fit --config cfg/small/bass.yaml
# python main.py fit --config cfg/small/other.yaml
# python main.py fit --config cfg/large/vocals.yaml
Look into the .yaml
files, if you want to modify hyper-parameters, training arguments, data pipeline,...
SDR
: median of the chunk-level SDR. This is a standard evaluation metric proposed in SiSEC18 and implemented in museval
.
Method | #params (M) | extra data? | vocals | drums | bass | other | all |
---|---|---|---|---|---|---|---|
UMX (h=512) | 8.9 | no | 6.25 | 6.04 | 5.07 | 4.28 | 5.41 |
UMXL (h=1024) | 28.2 | yes | 7.21 | 7.15 | 6.02 | 4.89 | 6.32 |
X-UMX | 35.6 | no | 6.61 | 6.47 | 5.43 | 4.46 | 5.79 |
Spleeter | 9.8 | yes | 6.86 | 6.71 | 5.51 | 4.02 | 5.91 |
Hybrid-Demucs | 83.6 | no | 8.13 | 8.24 | 8.67 | 5.59 | 7.68 |
Ours (small, h=256) | 3.8 | no | 6.56 | 6.68 | 5.34 | 4.57 | 5.79 |
Ours (large, h=1024) | 37.0 | no | 7.59 | 7.09 | 5.91 | 5.50 | 6.52 |
- To evaluate all sources from our public weights:
python evaluate.py --all --model_ckpt eCMU_checkpoints/small --data_root musdb/ # python evaluate.py --all --model_ckpt eCMU_checkpoints/small --data_root musdb/ --targets vocals drums
- To evaluate only 1 source once training a model, remember to replace
ckpt_path
in.yaml
config file:python evaluate.py --config cfg/small/vocals.yaml --data_root musdb/
If you find our eCMU useful, please consider citing as below:
@INPROCEEDINGS{dungtham2023eCMU,
author={Tham, Quoc Dung and Nguyen, Duc Dung},
booktitle={2023 RIVF International Conference on Computing and Communication Technologies (RIVF)},
title={eCMU: An Efficient Phase-aware Framework for Music Source Separation with Conformer},
year={2023},
pages={447-451},
doi={10.1109/RIVF60135.2023.10471783}
}