Skip to content

Xavaitron/SP_CUP_Phase_2

Repository files navigation

SP CUP Phase 2 - Audio Source Separation with DCCRN

Angle-conditioned audio source separation using a DCCRN architecture for IEEE Signal Processing Cup 2026.


πŸ“ Project Structure

SP_CUP_Phase_2/
β”œβ”€β”€ Dataset Generation/              # MATLAB scripts for synthetic dataset creation
β”‚   β”œβ”€β”€ train_anechoic.m             # Training data (150k samples, RT60=0.0)
β”‚   β”œβ”€β”€ train_reverb.m               # Training data (150k samples, RT60=0.5)
β”‚   β”œβ”€β”€ test_anechoic.m              # Test data (5k samples, fixed 90Β°/40Β° angles)
β”‚   └── test_reverb.m                # Test data (5k samples, fixed 90Β°/40Β° angles)
β”‚
β”œβ”€β”€ Model Inference/                 # Python training, testing, and inference
β”‚   β”œβ”€β”€ train_Conformer.py           # Training script
β”‚   β”œβ”€β”€ test_Conformer.py            # Evaluation script (SI-SDR, STOI, PESQ)
β”‚   β”œβ”€β”€ inference_Conformer.py       # Single-file inference
β”‚   β”œβ”€β”€ anechoic_Conformer.pth       # Trained model (anechoic)
β”‚   β”œβ”€β”€ reverb_Conformer.pth         # Trained model (reverberant)
β”‚   β”œβ”€β”€ evaluation_anechoic/         # Evaluation outputs
β”‚   └── evaluation_reverb/           # Evaluation outputs
β”‚
β”œβ”€β”€ Submission/                      # Self-contained competition submission
β”‚   β”œβ”€β”€ Task1_Anechoic/
β”‚   β”‚   β”œβ”€β”€ Task1_Anechoic_5dB.mat
β”‚   β”‚   β”œβ”€β”€ anechoic_Conformer.pth
β”‚   β”‚   β”œβ”€β”€ process_task1.py
β”‚   β”‚   └── [audio files]
β”‚   └── Task2_Reverberant/
β”‚       β”œβ”€β”€ Task2_Reverberant_5dB.mat
β”‚       β”œβ”€β”€ reverb_Conformer.pth
β”‚       β”œβ”€β”€ process_task2.py
β”‚       └── [audio files]
β”‚
β”œβ”€β”€ prepare_submission.m             # Generates submission folder from evaluation
β”œβ”€β”€ requirements.txt                 # Python dependencies
└── README.md

πŸ”§ Requirements

Python

pip install -r requirements.txt
Package Version Purpose
torch 2.6.0 Deep learning
torchaudio 2.6.0 Audio I/O
torchmetrics 1.8.2 PESQ, STOI, SI-SDR
soundfile latest Audio backend
pesq, pystoi latest Metrics

MATLAB

  • MATLAB R2020b+
  • Signal Processing Toolbox
  • Parallel Computing Toolbox
  • rir_generator MEX function

RIR Generator Setup

The rir_generator MEX function needs to be compiled before running dataset generation:

% 1. Navigate to RIR_gen folder
cd RIR_gen

% 2. Configure MEX compiler for C++
mex -setup

% Select a C++ compiler (MinGW-w64, MSVC, etc. must be installed)

% 3. Compile the RIR generator
mex rir_generator.cpp rir_generator_core.cpp

Note: On Windows, install MinGW-w64 or Visual Studio with C++ build tools. On Linux/macOS, ensure g++ or clang++ is available.


πŸš€ Pipeline

1. Download Raw Dataset

Download the raw audio files needed for dataset generation:

  1. Download Dataset_raw.zip from Google Drive
  2. Place the zip file in the project root (SP_CUP_Phase_2/)
  3. Extract it so the folder structure looks like:
SP_CUP_Phase_2/
└── Dataset_raw/
    β”œβ”€β”€ Male/       # Male speech files (.wav/.flac)
    β”œβ”€β”€ Female/     # Female speech files (.wav/.flac)
    β”œβ”€β”€ Noise/      # Noise files (.wav/.flac)
    └── Music/      # Music files (.wav/.flac)

Note: Dataset_raw/ and Dataset_raw.zip are gitignored and will not be committed.

2. Dataset Generation (MATLAB)

cd "Dataset Generation"
train_anechoic   % 150k samples, RT60=0.0, random angles
train_reverb     % 150k samples, RT60=0.5, random angles
test_anechoic    % 5k samples, RT60=0.0, fixed angles (90Β°/40Β°)
test_reverb      % 5k samples, RT60=0.5, fixed angles (90Β°/40Β°)

Output per sample:

sample_XXXXX/
β”œβ”€β”€ mixture.wav      # Stereo (target + interferer + noise)
β”œβ”€β”€ target.wav       # Ground truth
β”œβ”€β”€ interference.wav # Interference
└── meta.json        # {target_angle, interf_angle, rt60, ...}

Settings: SIR=0dB, SNR=5dB, 16kHz, 4s duration


3. Training

cd "Model Inference"
python train_Conformer.py

Edit config in script:

DATASET_ROOT = r"../Train_Dataset/reverb"  # or anechoic
RESUME_FROM = "reverb_Conformer.pth"       # or None

4. Evaluation

cd "Model Inference"
python test_Conformer.py

Edit config:

MODEL_PATH = "anechoic_Conformer.pth"
TEST_DATASET_ROOT = r"../Test_Dataset/anechoic"
OUTPUT_DIR = "evaluation_anechoic"

Outputs: Best samples by category (Overall, Male+Female, Male+Music, Male+Noise)


5. Single-File Inference

python inference_Conformer.py -i input.wav -a 90 -o output.wav -m reverb_Conformer.pth -d cuda
Arg Description
-i Input stereo audio
-a Target angle (0-180Β°)
-o Output file
-m Model checkpoint
-d Device (cpu/cuda)

6. Generate RIR Data for Submission

matlab -batch "run('generate_rir_data.m')"

Generates rir_data.mat containing Room Impulse Responses for:

  • Anechoic (RT60 = 0.0)
  • Reverberant (RT60 = 0.5)

7. Generate Submission

matlab -batch "run('prepare_submission.m')"

Creates self-contained Submission/ folder ready for competition.


πŸ—οΈ Model Architecture

DCCRNConformer (~10M parameters)

Component Details
Encoder Complex Conv2d: 2β†’48β†’96β†’192β†’256
Bottleneck Dual-Path Conformer (3 blocks, 4 heads)
Decoder Complex ConvTranspose2d with skip connections
Conditioning Angle MLP injection at bottleneck

Audio: 16kHz, STFT n_fft=512, hop=128, 3s fixed input


οΏ½ Evaluation Results

Anechoic Condition (5,000 samples)

Category SI-SDR (dB) STOI PESQ
Best Overall 16.91 0.950 2.64
Male + Noise 16.91 0.950 2.64
Male + Music 13.46 0.956 2.54
Male + Female 12.96 0.959 2.64

Inference: 50.6ms avg (59x real-time for 3s audio)

Reverberant Condition (5,000 samples)

Category SI-SDR (dB) STOI PESQ
Best Overall 12.49 0.942 2.48
Male + Noise 12.62 0.850 2.00
Male + Music 11.58 0.886 2.27
Male + Female 12.49 0.942 2.48

Inference: 50.5ms avg (59x real-time for 3s audio)


οΏ½πŸ“Š Metrics

Metric Description
SI-SDR Scale-Invariant Signal-to-Distortion Ratio (dB)
STOI Short-Time Objective Intelligibility (0-1)
PESQ Perceptual Evaluation of Speech Quality (-0.5 to 4.5)

πŸ“‹ Quick Start

# Setup
cd SP_CUP_Phase_2
pip install -r requirements.txt

# Inference
cd "Model Inference"
python inference_Conformer.py -i audio.wav -a 90 -o out.wav -d cuda

# Evaluate
python test_Conformer.py

# Generate submission
cd ..
matlab -batch "run('prepare_submission.m')"

License

Developed for IEEE Signal Processing Cup 2026 competition.

About

Final dataset generator code and model inference code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors