Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue With Score Computation #1

Open
davidma17 opened this issue Jun 11, 2019 · 1 comment
Open

Issue With Score Computation #1

davidma17 opened this issue Jun 11, 2019 · 1 comment

Comments

@davidma17
Copy link

Hello,

In testing this implementation on some real-life recordings I took, I happened to get very negative scores for the file named "female2.wav". I'm wondering how this happened, and more specifically how the scoring algorithm works (the documentation for the score function appears to say that it is a log probability, but somehow we have positive values?). Any indication as to how this one .wav file could have generated negative scores while other similar ones generated positive ones would be greatly appreciated.

Screen Shot 2019-06-11 at 12 19 38 PM

@SuperKogito
Copy link
Owner

Hello,

Unlike probabilities the scores of the log-likelihood/ log-probabilities can be negative. To read more on it, you can refer to this link log-probability, but I can see already, how some scores are not abiding to the theory in the link.

For the scoring algorithm logic, it is based on the Reynolds-paper but you can also refer to this shorter reproduction/summary-paper. The papers are about speaker verification/recognition but in a similar fashion, you can drop the UBM-use and consider the same logic for gender recognition. I have aslo written a small blog on this that you can find here.

Concerning your recordings, they should have the same characteristics (sample rate, mono, stereo or poly, etc.) and that's why for example recordings with different microphones can be challenging in similar recognition problems. The database used in the project is normalized and all files have the same sample rate and are all mono. You can verify this using ffmpeg -i filename.wav. This should result in something like ..., 16000 Hz, mono, s16, 256 kb/s. In case, your recordings do not have similar characteristics like the ones in the SLR45, then use ffmpeg to convert them and adjust them.

Please let me know how this turns out ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants