AI-174: Evaluate multilingual support for Whisper Small variants#109
AI-174: Evaluate multilingual support for Whisper Small variants#109ibhoomi16 wants to merge 1 commit into
Conversation
4dabc36 to
937f32b
Compare
|
the WER looks a little too accurate, can you elaborate on what dataset you used and is there any post-processing involved ? |
|
CLA Check = Passed |
|
The dataset was created by translating common banking commands into 5 languages and generating their audio using TTS. There’s no heavy post-processing just taking the text from the audio filename, replacing underscores with spaces, converting it to lowercase, and comparing it with Whisper’s output using jiwer. |
| return [] | ||
|
|
||
| results = [] | ||
| pid = psutil.Process(os.getpid()) |
There was a problem hiding this comment.
Since our final goal is to deploy the model on the mifos's mobile app, we should consider the memory constraints of a mobile device rather than the machine we are currently using, as they differ significantly.
There was a problem hiding this comment.
@itsPronay as of now we haven't decided if we want to host a model on client side or no
There was a problem hiding this comment.
@staru09 , In the ticket, it’s mentioned that for local models we should measure metrics like memory usage and latency. Could you please clarify which device we should base these measurements on?
If the intention is to run these models locally on a mobile device, the measurements would differ significantly compared to running them on a server (self-hosted). The approach to evaluating memory usage and latency would vary depending on the deployment environment.
So, when we are talking about 'Benchmark local-models (memory and latency)', what are we evaluating against?
- Server computer?
- Physical Mobile device?
- or It is not decided yet.
This PR introduces the benchmarking setup to evaluate the performance of
openai/whisper-smallmodels across multiple languages.openai/whisper-smallandwhisper-small.enbase models.benchmarking_whisper/directory that successfully calculates and logs: