Glossary

ASR

In this context, synonymous with STT.

Diarisation / speaker diarisation

Combination of speaker segmentation and speaker clustering. The first aims at finding speaker change points in an audio stream. The second aims at grouping together speech segments on the basis of speaker characteristics

Diff / Diff tool

The processes or tool for comparing two text files, presenting the deletions, insertions and replacements. Diff is a processing step in determining WER. Diff tools may use different algorithms and so produce different results.

Ground truth

In the context of STT, a high-accuracy transcript against which the results of the STT provider are compared. Usually prepared manually.

Hypothesis

The automatically-generated transcript returned by the provider/vendor. Compared against the reference to apply metrics in the analysis.

Metric

A measurement applied to the final transcript returned by the provider or to the process of transcription. Represents a dimension of difference between providers.

Provider / STT provider

A system, service or tool that provides speech-to-text capability.

Reference

Synonym for Ground Truth. Compared against the hypothesis to apply metrics in the analysis.

Results

The final outcome of the toolkit: the results of the analysis of the automatically generated transcripts. Presented as a matrix of metrics per provider.

Speaker recognition

Recognising a real world speaker from their voice.

STT

Speech-to-text. This is the loose term for automatic transcription systems. Other terms may describe specific technical functions.

Test set / test data

In the context of STT, audio-visual files with a corresponding reference transcript against which the results of STT providers are evaluated.

Vendor / STT vendor

A commercial speech to text provider.

WER

Word Error Rate. A commonly-used (but coarse) metric to evaluate the accuracy of machine transcription.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly