Angle-conditioned audio source separation using a DCCRN architecture for IEEE Signal Processing Cup 2026.
SP_CUP_Phase_2/
βββ Dataset Generation/ # MATLAB scripts for synthetic dataset creation
β βββ train_anechoic.m # Training data (150k samples, RT60=0.0)
β βββ train_reverb.m # Training data (150k samples, RT60=0.5)
β βββ test_anechoic.m # Test data (5k samples, fixed 90Β°/40Β° angles)
β βββ test_reverb.m # Test data (5k samples, fixed 90Β°/40Β° angles)
β
βββ Model Inference/ # Python training, testing, and inference
β βββ train_Conformer.py # Training script
β βββ test_Conformer.py # Evaluation script (SI-SDR, STOI, PESQ)
β βββ inference_Conformer.py # Single-file inference
β βββ anechoic_Conformer.pth # Trained model (anechoic)
β βββ reverb_Conformer.pth # Trained model (reverberant)
β βββ evaluation_anechoic/ # Evaluation outputs
β βββ evaluation_reverb/ # Evaluation outputs
β
βββ Submission/ # Self-contained competition submission
β βββ Task1_Anechoic/
β β βββ Task1_Anechoic_5dB.mat
β β βββ anechoic_Conformer.pth
β β βββ process_task1.py
β β βββ [audio files]
β βββ Task2_Reverberant/
β βββ Task2_Reverberant_5dB.mat
β βββ reverb_Conformer.pth
β βββ process_task2.py
β βββ [audio files]
β
βββ prepare_submission.m # Generates submission folder from evaluation
βββ requirements.txt # Python dependencies
βββ README.md
pip install -r requirements.txt| Package | Version | Purpose |
|---|---|---|
| torch | 2.6.0 | Deep learning |
| torchaudio | 2.6.0 | Audio I/O |
| torchmetrics | 1.8.2 | PESQ, STOI, SI-SDR |
| soundfile | latest | Audio backend |
| pesq, pystoi | latest | Metrics |
- MATLAB R2020b+
- Signal Processing Toolbox
- Parallel Computing Toolbox
rir_generatorMEX function
The rir_generator MEX function needs to be compiled before running dataset generation:
% 1. Navigate to RIR_gen folder
cd RIR_gen
% 2. Configure MEX compiler for C++
mex -setup
% Select a C++ compiler (MinGW-w64, MSVC, etc. must be installed)
% 3. Compile the RIR generator
mex rir_generator.cpp rir_generator_core.cppNote: On Windows, install MinGW-w64 or Visual Studio with C++ build tools. On Linux/macOS, ensure
g++orclang++is available.
Download the raw audio files needed for dataset generation:
- Download
Dataset_raw.zipfrom Google Drive - Place the zip file in the project root (
SP_CUP_Phase_2/) - Extract it so the folder structure looks like:
SP_CUP_Phase_2/
βββ Dataset_raw/
βββ Male/ # Male speech files (.wav/.flac)
βββ Female/ # Female speech files (.wav/.flac)
βββ Noise/ # Noise files (.wav/.flac)
βββ Music/ # Music files (.wav/.flac)
Note:
Dataset_raw/andDataset_raw.zipare gitignored and will not be committed.
cd "Dataset Generation"
train_anechoic % 150k samples, RT60=0.0, random angles
train_reverb % 150k samples, RT60=0.5, random angles
test_anechoic % 5k samples, RT60=0.0, fixed angles (90Β°/40Β°)
test_reverb % 5k samples, RT60=0.5, fixed angles (90Β°/40Β°)Output per sample:
sample_XXXXX/
βββ mixture.wav # Stereo (target + interferer + noise)
βββ target.wav # Ground truth
βββ interference.wav # Interference
βββ meta.json # {target_angle, interf_angle, rt60, ...}
Settings: SIR=0dB, SNR=5dB, 16kHz, 4s duration
cd "Model Inference"
python train_Conformer.pyEdit config in script:
DATASET_ROOT = r"../Train_Dataset/reverb" # or anechoic
RESUME_FROM = "reverb_Conformer.pth" # or Nonecd "Model Inference"
python test_Conformer.pyEdit config:
MODEL_PATH = "anechoic_Conformer.pth"
TEST_DATASET_ROOT = r"../Test_Dataset/anechoic"
OUTPUT_DIR = "evaluation_anechoic"Outputs: Best samples by category (Overall, Male+Female, Male+Music, Male+Noise)
python inference_Conformer.py -i input.wav -a 90 -o output.wav -m reverb_Conformer.pth -d cuda| Arg | Description |
|---|---|
-i |
Input stereo audio |
-a |
Target angle (0-180Β°) |
-o |
Output file |
-m |
Model checkpoint |
-d |
Device (cpu/cuda) |
matlab -batch "run('generate_rir_data.m')"Generates rir_data.mat containing Room Impulse Responses for:
- Anechoic (RT60 = 0.0)
- Reverberant (RT60 = 0.5)
matlab -batch "run('prepare_submission.m')"Creates self-contained Submission/ folder ready for competition.
DCCRNConformer (~10M parameters)
| Component | Details |
|---|---|
| Encoder | Complex Conv2d: 2β48β96β192β256 |
| Bottleneck | Dual-Path Conformer (3 blocks, 4 heads) |
| Decoder | Complex ConvTranspose2d with skip connections |
| Conditioning | Angle MLP injection at bottleneck |
Audio: 16kHz, STFT n_fft=512, hop=128, 3s fixed input
| Category | SI-SDR (dB) | STOI | PESQ |
|---|---|---|---|
| Best Overall | 16.91 | 0.950 | 2.64 |
| Male + Noise | 16.91 | 0.950 | 2.64 |
| Male + Music | 13.46 | 0.956 | 2.54 |
| Male + Female | 12.96 | 0.959 | 2.64 |
Inference: 50.6ms avg (59x real-time for 3s audio)
| Category | SI-SDR (dB) | STOI | PESQ |
|---|---|---|---|
| Best Overall | 12.49 | 0.942 | 2.48 |
| Male + Noise | 12.62 | 0.850 | 2.00 |
| Male + Music | 11.58 | 0.886 | 2.27 |
| Male + Female | 12.49 | 0.942 | 2.48 |
Inference: 50.5ms avg (59x real-time for 3s audio)
| Metric | Description |
|---|---|
| SI-SDR | Scale-Invariant Signal-to-Distortion Ratio (dB) |
| STOI | Short-Time Objective Intelligibility (0-1) |
| PESQ | Perceptual Evaluation of Speech Quality (-0.5 to 4.5) |
# Setup
cd SP_CUP_Phase_2
pip install -r requirements.txt
# Inference
cd "Model Inference"
python inference_Conformer.py -i audio.wav -a 90 -o out.wav -d cuda
# Evaluate
python test_Conformer.py
# Generate submission
cd ..
matlab -batch "run('prepare_submission.m')"Developed for IEEE Signal Processing Cup 2026 competition.