Skip to content

Conversation

@akshaykalkunte
Copy link
Collaborator

@akshaykalkunte akshaykalkunte commented Sep 21, 2025

📌 Description

  • Adds support for evaluating Gemini models with GCP and Vertex AI with the OpenAI chat completions API
  • Cleans up sample_config.json
  • Adds code to populate environment variables placeholders in run configs

🔗 Related Issue(s)

NA

🛠️ Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality including new tasks)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor / Code cleanup
  • Maintenance / Chore / Task
  • Other (please describe):

✅ How Has This Been Tested?

Gemini-2.5-Flash numbers looks okay. Tested an OpenAI and VLLM model to ensure nothing is broken.

Task Category Task Name Dataset/Benchmark Metric Gemini-2.5-Flash
Speech Recognition ASR Librispeech WER 2.75
Paralinguistics Emotion MELD llm_judge_binary 30
Paralinguistics Gender IEMOCAP llm_judge_binary 85.5
Paralinguistics Accent VoxCeleb llm_judge_binary 55.7
Paralinguistics Speaker Recognition mmau_mini llm_judge_binary 61.5
Paralinguistics Speaker Diarization CallHome WDER 41.83
Spoken Language Understanding Spoken QA public_sg_speech_qa_test llm_judge_detailed 74.34
Spoken Language Understanding Spoken Query QA BigBench Audio llm_judge_big_bench_audio 90.4
Spoken Language Understanding Speech Translation Covost2 (zh-CN->EN) BLEU 27.1
Spoken Language Understanding Spoken Dialogue Summarization mnsc_sds (P3) llm_judge_detailed 62.8
Spoken Language Understanding Intent Classification SLURP llm_judge_binary 79
Audio Understanding Scene Understanding audiocaps_qa llm_judge_detailed 34.82
Audio Understanding Music Understanding mu_chomusic_test llm_judge_binary 72.9
Spoken Language Reasoning Speech Instruction Following IFEval instruction_following_score 86.28
Spoken Language Reasoning Speech Instruction Following MTBench llm_judge_mt_bench 69.88
Spoken Language Reasoning Speech-to-Coding Spider sql_score (EM) 77.12
Spoken Language Reasoning SPeech Function Calling BFCL bfcl_match_score 93.23
Safety and Security Safety advbench redteaming_judge 97.5
Safety and Security Spoofing avspoof llm_judge_binary 80.5
Aggregate NA NA NA 69.35
  • Unit tests
  • Integration tests
  • Manual testing

Test Results / Screenshots (if applicable):

📸 Screenshots / Demos

NA

📋 Checklist

  • Code follows project style guidelines
  • Tests have been added/updated (if applicable)
  • Documentation has been updated (if applicable)
  • Linked relevant issue(s)
  • Self-reviewed my code

🙌 Additional Notes

NA

@akshaykalkunte akshaykalkunte changed the title feat: Add Gemini support [FEAT] Add Gemini support Sep 21, 2025
Copy link
Collaborator

@nhhoang96 nhhoang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Can you also provide additional instructions on how to get started with Gemini endpoint setup in the landing-page README?

Please also update the endpoint support on landing-page README as well (See attached photo below for specific reference).

Screenshot 2025-09-22 at 9 04 02 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants