Canary ouputs English for Arabic Speech #11826

BenoitWang · 2025-01-11T18:05:15Z

Describe the bug

Hello, I am trying to infer Canary 1b for Arabic ASR with Riva quick start 2.18.0. According to the description it has already supported Arabic, but it outputs English instead of Arabic tokens.

Here's a partial extract of my config.sh

service_enabled_asr=true
service_enabled_nlp=false
service_enabled_tts=false
service_enabled_nmt=false

asr_language_code=("ar-AR")
asr_acoustic_model=("canary")
asr_acoustic_model_variant=("1b")
use_existing_rmirs=false

When I used the same config to infer Parakeet 1.1b_unified_ml_cs_concat and Parakeet 1.1b_unified_ml_cs_universal, they do output Arabic tokens, so I guess the issue is within the Canary model.

Any idea please?

The text was updated successfully, but these errors were encountered:

tbartley94 · 2025-01-13T17:16:05Z

@BenoitWang Can you try ar-SA? We may have just assigned Arabic language under a Saudi dialect tag for that round.

BenoitWang · 2025-01-13T18:24:17Z

Hi @tbartley94, just tried but still got English outputs.

tbartley94 · 2025-01-13T18:27:23Z

hmm, okay will look into it

myungjongk · 2025-01-14T06:32:54Z

@BenoitWang It looks like you're using Riva client to run Canary-1b model. If so, you need to pass language code in the client side. For example, riva_asr_client --language_code=ar-AR --audio_file=ar-AR_sample.wav. Could you try with --language_code param?

BenoitWang · 2025-01-16T11:09:42Z

Hi @myungjongk, thank you that works much better. However when I looked into its transcriptions, compared with parakeet-ctc-1.1b-concat (the 1st image), it still generates very often the English tokens (the 2nd image), which degrades both the WER and CER quite a lot. The 20 samples are from CommonVoice 18.0. Am I still missing something please?

In fact we're running this Arabic ASR leaderboard, and we find that it performs badly compared to the other models, but we do wish to include Canary if this get fixed, thanks for your help @tbartley94 @myungjongk .

BenoitWang · 2025-01-16T14:58:51Z

Another observation is that both Canary 1b and 0.6b-turbo hallucinate quite often on Arabic speech, generating repeated tokens, which significantly increases CER.

tbartley94 · 2025-01-16T18:54:07Z

Ooh, this is a good catch. Thanks for catching this.

@myungjongk This may be a deployment issue, I'll evaluate with the NeMo model on my end to see if there's something that didn't pop up in our evaluations.

BenoitWang added the bug Something isn't working label Jan 11, 2025

BenoitWang mentioned this issue Jan 13, 2025

Nvidia Model Requests Natural-Language-Processing-Elm/open_universal_arabic_asr_leaderboard#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Canary ouputs English for Arabic Speech #11826

Canary ouputs English for Arabic Speech #11826

BenoitWang commented Jan 11, 2025

tbartley94 commented Jan 13, 2025

BenoitWang commented Jan 13, 2025

tbartley94 commented Jan 13, 2025

myungjongk commented Jan 14, 2025

BenoitWang commented Jan 16, 2025 •

edited

Loading

BenoitWang commented Jan 16, 2025

tbartley94 commented Jan 16, 2025

Canary ouputs English for Arabic Speech #11826

Canary ouputs English for Arabic Speech #11826

Comments

BenoitWang commented Jan 11, 2025

tbartley94 commented Jan 13, 2025

BenoitWang commented Jan 13, 2025

tbartley94 commented Jan 13, 2025

myungjongk commented Jan 14, 2025

BenoitWang commented Jan 16, 2025 • edited Loading

BenoitWang commented Jan 16, 2025

tbartley94 commented Jan 16, 2025

BenoitWang commented Jan 16, 2025 •

edited

Loading