Skip to content

Latest commit

 

History

History

pyannote-audio

Pyannote-audio : Speaker Diarization

Input

Audio file (.wav format).

Example
input: data/demo.wav

(Wav file from https://github.com/pyannote/pyannote-audio/tree/develop/pyannote/audio/sample)

Output

When and who spoke. Output

[ 00:00:06.714 -->  00:00:07.003] A speaker91
[ 00:00:07.003 -->  00:00:07.173] B speaker90
[ 00:00:07.580 -->  00:00:08.310] C speaker91
[ 00:00:08.310 -->  00:00:09.923] D speaker90
[ 00:00:09.923 -->  00:00:10.976] E speaker91
[ 00:00:10.466 -->  00:00:14.745] F speaker90
[ 00:00:14.303 -->  00:00:17.886] G speaker91
[ 00:00:18.022 -->  00:00:21.502] H speaker90
[ 00:00:18.157 -->  00:00:18.446] I speaker91
[ 00:00:21.774 -->  00:00:28.531] J speaker91
[ 00:00:27.886 -->  00:00:29.991] K speaker90

Requirements

This model recommends additional module.

$ pip3 install -r requirements.txt

Usage

Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.

For the sample

$ python pyannote-audio.py -i ./data/sample.wav

For the sample with plot

$ python pyannote-audio.py -i ./data/sample.wav --plt

For the sample with verification

$ python pyannote-audio.py -i ./data/sample.wav -g ./data/sample.rttm

If you want to specify the audio, put the file path after the --i or -input option.

$ python pyannote-audio.py --i FILE_PATH

If you want to specify the ground truth, put the file path after the --ig or -input_ground option.

$ python pyannote-audio.py --ig FILE_PATH

If you want to specify the output file, put the file path after the --o or -output option.

$ python pyannote-audio.py --o FILE_PATH

If you want to specify the output ground truth file, put the file path after the --og or -output_ground option.

$ python pyannote-audio.py --og FILE_PATH

If you know the number of speakers, put the numper --num or -num_speaker option.

$ python pyannote-audio.py --num 2

If you know the maxisimum number of speakers, put the numper --max or -max_speaker option.

$ python pyannote-audio.py --max 4

If you know the minimum number of speakers, put the numper --min or -min_speaker option.

$ python pyannote-audio.py --min 2

By giving the --e or -error option, you can get diarization error rate.

$ python pyannote-audio.py --use_onnx

By giving the --plt option, you can visualize results.

$ python pyannote-audio.py --use_onnx

By giving the --use_onnx option, you can use onnx.

$ python pyannote-audio.py --use_onnx

By giving the --embed option, you can get embedding vector in the input file.

$ python pyannote-audio.py --embed

Reference

Framework

Pytorch

Model Format

ONNX opset=14,17

Netron