Skip to content

Vanilla JS web interface for Gemini 2.0 flash-exp Multimodal API with text, audio, camera, screen inputs and audio responses and function calling

License

Notifications You must be signed in to change notification settings

ViaAnthroposBenevolentia/gemini-2-live-api-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemini 2.0 Flash Multimodal Live API Client

A lightweight vanilla JavaScript implementation of the Gemini 2.0 Flash Multimodal Live API client. This project provides real-time interaction with Gemini's API through text, audio, video, and screen sharing capabilities.

This is a simplified version of Google's original React implementation, created in response to this issue.

Live Demo on GitHub Pages

Live Demo

Key Features

  • Real-time chat with Gemini 2.0 Flash Multimodal Live API
  • Real-time audio responses from the model
  • Real-time audio input from the user, allowing interruptions
  • Real-time video streaming from the user's webcam
  • Real-time screen sharing from the user's screen
  • Function calling
  • Built with vanilla JavaScript (no dependencies)
  • Mobile-friendly

Prerequisites

  • Modern web browser with WebRTC, WebSocket, and Web Audio API support
  • Google AI Studio API key
  • python -m http.server or npx http-server or Live Server extension for VS Code (to host a server for index.html)

Quick Start

  1. Get your API key from Google AI Studio

  2. Clone the repository

    git clone https://github.com/ViaAnthroposBenevolentia/gemini-2-live-api-demo.git
  3. Start the development server (adjust port if needed):

    cd gemini-2-live-api-demo
    python -m http.server 8000 # or npx http-server 8000 or Open with Live Server extension for VS Code
  4. Access the application at http://localhost:8000

  5. Open the settings at the top right, paste your API key, and click "Save"

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

This project is licensed under the MIT License.

About

Vanilla JS web interface for Gemini 2.0 flash-exp Multimodal API with text, audio, camera, screen inputs and audio responses and function calling

Topics

Resources

License

Stars

Watchers

Forks