Skip to content

Phrase-boosting feature for Parakeet v2 #14500

@hoavt-54

Description

@hoavt-54

Thanks @andrusenkoau for the recent phrase boosting features in ASR. I've tried using that for a while but only was able to boost phrases with AED decoding (Canary-1b). It does not really work for Parakeet v2 or I am missing something. WER gets worst as boosting_tree_alpha increases.

os.environ["MODEL_NAME"] = "/home/jovyan/.cache/huggingface/hub/models--nvidia--parakeet-tdt-0.6b-v2/snapshots/f1cd6697a2ec38060af8a36ce206a4ad9c4467bc/parakeet-tdt-0.6b-v2.nemo"
os.environ["BATCH_SIZE"] = "1"
os.environ["BEAM_SIZE"] = "5"
os.environ["KEY_WORDS_LIST"] = cb_list_file
os.environ["HYDRA_FULL_ERROR"] = "1"
!python examples/asr/speech_to_text_eval.py \
        model_path=${MODEL_NAME} \
        dataset_manifest=./context_biasing_data/gtc_data_subset_10f.json \
        batch_size=${BATCH_SIZE} \
        output_filename=boosted_output.txt \
        text_processing.do_lowercase=true \
        text_processing.rm_punctuation=true \
        rnnt_decoding.strategy="malsd_batch" \ # greedy, greedy_batch
        # rnnt_decoding.beam.beam_size=${BEAM_SIZE} \
        rnnt_decoding.beam.boosting_tree.key_phrases_file=${KEY_WORDS_LIST} \
        rnnt_decoding.beam.boosting_tree.context_score=1.0 \
        rnnt_decoding.beam.boosting_tree.depth_scaling=2.0 \
        rnnt_decoding.beam.boosting_tree_alpha=0.6

Phrase-boosting data test: https://asr-tutorial-data.s3.eu-north-1.amazonaws.com/context_biasing_data.gz from ASR_Context_Biasing.ipynb. By the way if we could have a tutorial like the current ASR_Context_Biasing.ipynb, that would be great.

boost_keywords.txt:

gpu
nvidia
nvidia's
nvlink
omniverse
cunumeric
numpy
dgx
dgxs
dlss
cpu
tsmc
culitho
xlabs
tensorrt
tensorflow
pytorch
aws
chatgpt
pcie

WER and CER results:

boosting_tree_alpha     outputs
0.0                Dataset WER/CER 12.06%/4.96%
0.1                Dataset WER/CER 13.07%/5.05%
0.2                Dataset WER/CER 13.07%/5.05%
0.3                Dataset WER/CER 12.06%/4.96%
0.4                Dataset WER/CER 12.06%/4.96%
0.5                Dataset WER/CER 13.07%/6.03%
0.6                Dataset WER/CER 13.07%/6.03%
0.7                Dataset WER/CER 14.07%/6.12%
0.8                Dataset WER/CER 14.07%/6.12%
1.0                Dataset WER/CER 14.07%/6.38%
2.0                Dataset WER/CER 22.61%/14.98%

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version: Linux jupyterserver-6f8d46c9c4-6s6nj 4.18.0-553.40.1.el8_10.x86_64
  • PyTorch version: '2.7.1+cu126'
  • Python version: 3.11
  • GPU: L4
  • branch: main

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions