Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to evaluate SIR and SDR for mono wav file #13

Closed
Shin-ichi-Takayama opened this issue Jun 28, 2022 · 2 comments
Closed

How to evaluate SIR and SDR for mono wav file #13

Shin-ichi-Takayama opened this issue Jun 28, 2022 · 2 comments

Comments

@Shin-ichi-Takayama
Copy link

Hello.

I have a question about how to evaluate SIR and SDR for mono wav file.
How do I evaluate SIR and SDR for mono wav files?

I have the following mono wav files.

  • Mixed voice and noise audio
  • Voice audio (ref.wav)
  • Noise audio
  • Inference file (est.wav)

The length of the wav file is 4 seconds. The sampling frequency is 16k Hz.
I calculated the SIR of the mono wav file and it was Inf.
As I asked in Issue #12, the SIR was Inf for the following code.

from scipy.io import wavfile
import numpy as np
import fast_bss_eval

_, ref = wavfile.read("./data/ref.wav")
_, est = wavfile.read("./data/est.wav")

ref = ref[None, ...]
est = est[None, ...]

# compute the metrics
sdr, sir, sar = fast_bss_eval.bss_eval_sources(ref, est, compute_permutation=False)

print('sdr:', sdr)
print('sir:', sir)
print('sar:', sar)

sdr: 14.188884277900977
sir: inf
sar: 14.18888427790095

However, I would like to evaluate the SIR with a mono wav file.
To avoid the SIR to be Inf, I divided the wav file into 4 parts. Is the following code able to evaluate SIR and SDR correctly?

from scipy.io import wavfile
import numpy as np
import fast_bss_eval

ref = np.zeros((4, 16000))
est = np.zeros((4, 16000))

_, ref_temp = wavfile.read("./data/ref1.wav")
_, est_temp = wavfile.read("./data/est1.wav")
ref[0] = ref_temp
est[0] = est_temp

_, ref_temp = wavfile.read("./data/ref2.wav")
_, est_temp = wavfile.read("./data/est2.wav")
ref[1] = ref_temp
est[1] = est_temp

_, ref_temp = wavfile.read("./data/ref3.wav")
_, est_temp = wavfile.read("./data/est3.wav")
ref[2] = ref_temp
est[2] = est_temp

_, ref_temp = wavfile.read("./data/ref4.wav")
_, est_temp = wavfile.read("./data/est4.wav")
ref[3] = ref_temp
est[3] = est_temp

# compute the metrics
sdr, sir, sar = fast_bss_eval.bss_eval_sources(ref, est, compute_permutation=False)

print('sdr:', sdr.mean())
print('sir:', sir.mean())
print('sar:', sar.mean())

sdr: 16.156123610321156
sir: 28.957842593289392
sar: 16.444840346137177

What signals are needed for each channel of ref and est?
Best regards.

@fakufaku
Copy link
Owner

This is indeed a good question! I don't think splitting the file is the correct way to do it.

In your case, you have access to both the clean speech and the noise, so the best is to use both as references.

from scipy.io import wavfile
import numpy as np
import fast_bss_eval

# assume all files are mono
_, speech_ref = wavfile.read("./data/ref.wav")
_, noise_ref = wavfile.read("./data/noise.wav")
_, est = wavfile.read("./data/est.wav")

ref = np.stack([speech_ref, noise_ref], axis=0)
# I think it should work also with `est[None, ...]`, but to be sure make est
# the same number of channels as ref
est =np.stack([est, est], axis=0)

# compute the metrics
sdr, sir, sar = fast_bss_eval.bss_eval_sources(ref, est, compute_permutation=False)

print('sdr:', sdr[0])
print('sir:', sir[0])
print('sar:', sar[0])

@Shin-ichi-Takayama
Copy link
Author

Thank you for your response.
I was able to evaluate the SIR and SDR with a mono wav file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants