AI-174: Evaluate multilingual support for Whisper Small variants by ibhoomi16 · Pull Request #109 · openMF/community-ai

ibhoomi16 · 2026-03-23T04:29:43Z

This PR introduces the benchmarking setup to evaluate the performance of openai/whisper-small models across multiple languages.

Built an automated evaluation pipeline to test openai/whisper-small and whisper-small.en base models.
Tested the models across 5 target languages/demographics: English (en), Hindi (hi), Spanish (es), French (fr), and Portuguese (pt).
Engineered a metric tracking system within the benchmarking_whisper/ directory that successfully calculates and logs:
- Word Error Rate (WER) (for measuring transcription accuracy).
- Latency / TTFA (for measuring STT speed and responsiveness).
- Memory Footprint (for checking on-device low-end mobile constraints).

staru09 · 2026-03-25T06:08:02Z

the WER looks a little too accurate, can you elaborate on what dataset you used and is there any post-processing involved ?

DavidH-1 · 2026-03-27T21:08:17Z

CLA Check = Passed

ibhoomi16 · 2026-03-30T04:44:20Z

The dataset was created by translating common banking commands into 5 languages and generating their audio using TTS. There’s no heavy post-processing just taking the text from the audio filename, replacing underscores with spaces, converting it to lowercase, and comparing it with Whisper’s output using jiwer.

itsPronay · 2026-04-02T11:47:18Z

+        return []
+
+    results = []
+    pid = psutil.Process(os.getpid())


Since our final goal is to deploy the model on the mifos's mobile app, we should consider the memory constraints of a mobile device rather than the machine we are currently using, as they differ significantly.

as well as the latency

@itsPronay as of now we haven't decided if we want to host a model on client side or no

@staru09 , In the ticket, it’s mentioned that for local models we should measure metrics like memory usage and latency. Could you please clarify which device we should base these measurements on?

If the intention is to run these models locally on a mobile device, the measurements would differ significantly compared to running them on a server (self-hosted). The approach to evaluating memory usage and latency would vary depending on the deployment environment.

So, when we are talking about 'Benchmark local-models (memory and latency)', what are we evaluating against?

Server computer?

Physical Mobile device?

or It is not decided yet.

ibhoomi16 requested a review from a team March 23, 2026 04:29

AI-174: Evaluate multilingual support for Whisper Small variants

937f32b

ibhoomi16 force-pushed the feature/whisper-benchmark branch from 4dabc36 to 937f32b Compare March 23, 2026 04:31

itsPronay suggested changes Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI-174: Evaluate multilingual support for Whisper Small variants#109

AI-174: Evaluate multilingual support for Whisper Small variants#109
ibhoomi16 wants to merge 1 commit into
openMF:devfrom
ibhoomi16:feature/whisper-benchmark

ibhoomi16 commented Mar 23, 2026

Uh oh!

staru09 commented Mar 25, 2026

Uh oh!

DavidH-1 commented Mar 27, 2026

Uh oh!

ibhoomi16 commented Mar 30, 2026

Uh oh!

itsPronay Apr 2, 2026

Uh oh!

itsPronay Apr 2, 2026

Uh oh!

staru09 Apr 2, 2026

Uh oh!

itsPronay Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ibhoomi16 commented Mar 23, 2026

Uh oh!

staru09 commented Mar 25, 2026

Uh oh!

DavidH-1 commented Mar 27, 2026

Uh oh!

ibhoomi16 commented Mar 30, 2026

Uh oh!

itsPronay Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

itsPronay Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

staru09 Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

itsPronay Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

itsPronay Apr 2, 2026 •

edited

Loading