Skip to content

Latest commit

 

History

History
113 lines (100 loc) · 5.12 KB

README.md

File metadata and controls

113 lines (100 loc) · 5.12 KB

eCMU

This is an Official implementation of eCMU: An Efficient Phase-aware Framework for Music Source Separation with Conformer (IEEE RIVF23)

our pipeline

Our implementation was developed based on the sdx23-aimless framework.

Demo Page:

🎼 You can remix songs and enjoy here 📻

Abstract

From our baseline Open-Unmix (UMX), we:

  • Attempt to build an affordable model to solve the music source separation (MSS) task in the spectral domain with limited computing resources.
  • Apply a differentiable Multi-channel Wiener Filter (MWF) into a mask-based prediction model to end-to-end estimated the complex spectrogram for each source.
  • Optimize the model by using the Multi-domain loss function on the public MUSDB18-HQ dataset.
  • Leverage the ability of Conformer blocks to capture both local and global feature dependencies on time and frequency axis.

Installation

  • python 3.8+
  • pytorch-lightning
  • pytorch
pip install -r requirments.txt

Getting Started

Note: eCMU is a single-target model. This means each stem is separated by a specific model. Therefore, there are four single models in total.

  • Download model weights here.
  • To separate all sources on gpu:
python -m core.models.separator \
    assets/samples/22_TaylorSwift.mp3  \
    --model_ckpt eCMU_checkpoints/small
  • To separate all sources on cpu:
python -m core.models.separator \
    assets/samples/22_TaylorSwift.mp3  \
    --model_ckpt eCMU_checkpoints/small \
    --no-gpu
  • Or, even if you want to separte a subset of stems (i.e: only {vocals, drums}), you can run:
python -m core.models.separator \
    assets/samples/22_TaylorSwift.mp3  \
    --targets vocals drums \
    --model_ckpt eCMU_checkpoints/small 

Other audio formats: .wav, .m4a, .aac are also supported.

Download audio from youtube

In case you want to separate audio from youtube url, you can download the audio files first by:

python -m scripts.download <url>

Then, run the above inference commands with the new audio input.

Training

Download MUSDB18-HQ dataset and uncompress into musdb/

musdb/
|____ train/
|____ test/

Note: Remember to replace the path of data root in .yaml files before training.

python main.py fit --config cfg/small/vocals.yaml
# python main.py fit --config cfg/small/drums.yaml
# python main.py fit --config cfg/small/bass.yaml
# python main.py fit --config cfg/small/other.yaml

# python main.py fit --config cfg/large/vocals.yaml

Look into the .yaml files, if you want to modify hyper-parameters, training arguments, data pipeline,...

Evaluation

SDR: median of the chunk-level SDR. This is a standard evaluation metric proposed in SiSEC18 and implemented in museval.

Method #params (M) extra data? vocals drums bass other all
UMX (h=512) 8.9 no 6.25 6.04 5.07 4.28 5.41
UMXL (h=1024) 28.2 yes 7.21 7.15 6.02 4.89 6.32
X-UMX 35.6 no 6.61 6.47 5.43 4.46 5.79
Spleeter 9.8 yes 6.86 6.71 5.51 4.02 5.91
Hybrid-Demucs 83.6 no 8.13 8.24 8.67 5.59 7.68
Ours (small, h=256) 3.8 no 6.56 6.68 5.34 4.57 5.79
Ours (large, h=1024) 37.0 no 7.59 7.09 5.91 5.50 6.52
  • To evaluate all sources from our public weights:
    python evaluate.py --all --model_ckpt eCMU_checkpoints/small --data_root musdb/
    # python evaluate.py --all --model_ckpt eCMU_checkpoints/small --data_root musdb/ --targets vocals drums
  • To evaluate only 1 source once training a model, remember to replace ckpt_path in .yaml config file:
    python evaluate.py --config cfg/small/vocals.yaml --data_root musdb/

Citations

If you find our eCMU useful, please consider citing as below:

@INPROCEEDINGS{dungtham2023eCMU,
  author={Tham, Quoc Dung and Nguyen, Duc Dung},
  booktitle={2023 RIVF International Conference on Computing and Communication Technologies (RIVF)}, 
  title={eCMU: An Efficient Phase-aware Framework for Music Source Separation with Conformer}, 
  year={2023},
  pages={447-451},
  doi={10.1109/RIVF60135.2023.10471783}
}