Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canary ouputs English for Arabic Speech #11826

Open
BenoitWang opened this issue Jan 11, 2025 · 7 comments
Open

Canary ouputs English for Arabic Speech #11826

BenoitWang opened this issue Jan 11, 2025 · 7 comments
Labels
bug Something isn't working

Comments

@BenoitWang
Copy link

Describe the bug

Hello, I am trying to infer Canary 1b for Arabic ASR with Riva quick start 2.18.0. According to the description it has already supported Arabic, but it outputs English instead of Arabic tokens.

Here's a partial extract of my config.sh

service_enabled_asr=true
service_enabled_nlp=false
service_enabled_tts=false
service_enabled_nmt=false

asr_language_code=("ar-AR")
asr_acoustic_model=("canary")
asr_acoustic_model_variant=("1b")
use_existing_rmirs=false

When I used the same config to infer Parakeet 1.1b_unified_ml_cs_concat and Parakeet 1.1b_unified_ml_cs_universal, they do output Arabic tokens, so I guess the issue is within the Canary model.

Any idea please?

@tbartley94
Copy link
Collaborator

@BenoitWang Can you try ar-SA? We may have just assigned Arabic language under a Saudi dialect tag for that round.

@BenoitWang
Copy link
Author

Hi @tbartley94, just tried but still got English outputs.

@tbartley94
Copy link
Collaborator

hmm, okay will look into it

@myungjongk
Copy link

@BenoitWang It looks like you're using Riva client to run Canary-1b model. If so, you need to pass language code in the client side. For example, riva_asr_client --language_code=ar-AR --audio_file=ar-AR_sample.wav. Could you try with --language_code param?

@BenoitWang
Copy link
Author

BenoitWang commented Jan 16, 2025

Hi @myungjongk, thank you that works much better. However when I looked into its transcriptions, compared with parakeet-ctc-1.1b-concat (the 1st image), it still generates very often the English tokens (the 2nd image), which degrades both the WER and CER quite a lot. The 20 samples are from CommonVoice 18.0. Am I still missing something please?

Image Image

In fact we're running this Arabic ASR leaderboard, and we find that it performs badly compared to the other models, but we do wish to include Canary if this get fixed, thanks for your help @tbartley94 @myungjongk .

@BenoitWang
Copy link
Author

Another observation is that both Canary 1b and 0.6b-turbo hallucinate quite often on Arabic speech, generating repeated tokens, which significantly increases CER.

Image

@tbartley94
Copy link
Collaborator

Ooh, this is a good catch. Thanks for catching this.

@myungjongk This may be a deployment issue, I'll evaluate with the NeMo model on my end to see if there's something that didn't pop up in our evaluations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants