-
Notifications
You must be signed in to change notification settings - Fork 8
Glossary
In this context, synonymous with STT.
Combination of speaker segmentation and speaker clustering. The first aims at finding speaker change points in an audio stream. The second aims at grouping together speech segments on the basis of speaker characteristics
The processes or tool for comparing two text files, presenting the deletions, insertions and replacements. Diff is a processing step in determining WER. Diff tools may use different algorithms and so produce different results.
In the context of STT, a high-accuracy transcript against which the results of the STT provider are compared. Usually prepared manually.
The automatically-generated transcript returned by the provider/vendor. Compared against the reference to apply metrics in the analysis.
A measurement applied to the final transcript returned by the provider or to the process of transcription. Represents a dimension of difference between providers.
A system, service or tool that provides speech-to-text capability.
Synonym for Ground Truth. Compared against the hypothesis to apply metrics in the analysis.
The final outcome of the toolkit: the results of the analysis of the automatically generated transcripts. Presented as a matrix of metrics per provider.
Recognising a real world speaker from their voice.
Speech-to-text. This is the loose term for automatic transcription systems. Other terms may describe specific technical functions.
In the context of STT, audio-visual files with a corresponding reference transcript against which the results of STT providers are evaluated.
A commercial speech to text provider.
Word Error Rate. A commonly-used (but coarse) metric to evaluate the accuracy of machine transcription.