Skip to content

Metric: Named entity recognition

EyalLavi edited this page Sep 4, 2019 · 1 revision
Status: DRAFT

This is a working document to discuss the implementation of this metric. The authoritative documentation will be included in the main repo code.

Use case and Scope

The aim of this metric is to complement the standard WER. WER is a blunt metric, in that it does not take into account the relative importance of the words. People and place names are important in many scenarios, and this metric will provide an indication of how well an engine recognises them.

This metric measures the ability of an ASR engine to correctly transcribe already-known named entities. It does not evaluate the identification of named entities as such or their semantic tagging, which is a separate task and is not in scope. We start with an assumption that a list of named entities already exists. It can be provided manually by the user, by an NER ML engine, or any external source such as a database.

Inputs

To simplify the implementation, the input will be a CSV of named entities. These can contain multiple tokens.

Following the same logic as for normalization files:

  • Multiple files will be supported.
  • The CSV format will be identical to the normalization rules CSV
  • The path to the NE CSV file(s) will be included in the configuration file, under a new section [named entities].
  • Where multiple files are provided, they will be merged with no further processing such as de-duplicaiton, e.g. the following will all be considered as separate entities: Acme Corp., Acme Corporation, Acme-Corp.
  • A new metric(s) command line parameter will be added TBC

Outputs

Few options here:

  • Simple percentage of instances in the hypothesis that matched exactly a named entity in the reference (matched position and value).
  • A weighted WER for the entire hypothesis, where correctly transcribed named entities receive a higher score than other matched words. What weight should we use? BBC R&D used 0 and 1 multipliers, but this skews the WER completely.
  • NE-only WER, where the WER is calculated for named entities only in the reference.
  • Simple count of instances of named entities that were identified in the transcript. This does not require a reference - is there a use case for this?

Implementation

  • Read the NE from the files referenced in the config.
  • If normalization rules exist, apply them to the named entities as well as the reference and transcript.
  • In the reference, tag named entities. Note that currently we only deal with single words - we need to keep words together when they belong to a single NE
  • Perform a diff
  • Analyse the diff results using the tagged NEs to produce the requested metric(s).
Clone this wiki locally