Potential usages

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned
Denoising	🤗	red	orange	gradio	3.28.1	app.py	false

This is a repo that implements DEMUCS model proposed in Real Time Speech Enhancement in the Waveform Domain from scratch in Pytorch. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. The web interface for this project is available at hugging face. You can record your voice in noisy conditions and get denoised version using DEMUCS model.

Potential usages

Real time denoising in communication systems (such as skype)
Improving speech assistants (ASR part)

Data

In the scope of this project Valentini dataset in used. It is clean and noisy parallel speech database. The database was designed to train and test speech enhancement methods that operate at 48kHz. There are 56 speakers and ~10 gb of speech data.

For model improvement it is possible to use a bigger training set from DNS challenge.

Training

The training process in impemented in Pytorch. The data is (noisy speech, clean speech) pairs that are loaded as 2 second samples, randomly cutted from audio and padded if necessary. Model is optimized using SGD. In terms of loss functions, the L1 loss and MultiResolutionSTFTLoss are used. MultiResolutionSTFTLoss is the sum of STFT loss over different window sizes, hop sizes and fft sizes.

$$L_{STFT}= L_{sc} + L_{mag}$$

$$L_{sc}= \frac{|| |STFT(\tilde{x})| - |STFT(x)| ||_{F}^{1}}{|STFT(x)|}$$

$$L_{mag} = \frac{1}{T}|| log|STFT(\tilde{x})| - log|STFT(x)| ||_{F}^{1}$$

where T is the time points in the waveform.

Metrics

Perceptual Evaluation of Speech Quality (PESQ)
Short-Time Objective Intelligibility (STOI)

The PESQ metric is used for estimating overall speech quality after denoising and STOI is used for estimating speech intelligibility after denoising. Intelligibility measure is highly correlated with the intelligibility of degraded speech signals

Experiments

For tracking experiments local server of Weights & Biases is used. To manage configs for different experiments hydra is used. It allows an easy way to track configs and override paramaters.

Experiment	Description	Result
Baseline	Initial experiment with L1 loss	Poor quality
Baseline_L1_Multi_STFT_loss	Changed loss to Multi STFT + L1 loss	Better performance
L1_Multi_STFT_no_resample	Tried to train without resampling	No impovement, probably because RELU on the last layer
Updated_DEMUCS	Used relu in the last layer. Removed it.	Significant improvement
wav_normalization	Tried to normalized wav by std during training	Small improvement
original_sr	Train with original sample rate	Significant improvement
increased_L	Increased number of encoder-decoder pairs from 3 to 5	Performance comparable with original_sr
double_sr	Train with double sample rate	Small improvement
replicate paper	Lower learning rate and fix bug in dataloader	Massive improvement!

Final model

 H: 64
 L: 5
 encoder:
   conv1:
     kernel_size: 8
     stride: 2
   conv2:
     kernel_size: 1
     stride: 1

 decoder:
   conv1:
     kernel_size: 1
     stride: 1
   conv2:
     kernel_size: 8
     stride: 2

Testing

	valentini_PESQ	valentini_STOI
Spectral Gating	1.7433	0.8844
Demucs	2.4838	0.9192
DEMUCS (facebook)	3.07	95

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github/workflows		.github/workflows
conf		conf
data		data
denoisers		denoisers
images		images
testing		testing
training		training
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Potential usages

Data

Training

Metrics

Experiments

Final model

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

BorisovMaksim/denoising

Folders and files

Latest commit

History

Repository files navigation

Potential usages

Data

Training

Metrics

Experiments

Final model

Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages