Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions docs/devpost-submission.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# missless — DevPost Submission Text

> For the Gemini Live Agent Challenge 2026 (Creative Storyteller Track)

---

## Inspiration

We all carry someone in our hearts — a grandparent who passed away before we could say goodbye, a parent we lost touch with, a friend who moved to the other side of the world. We built **missless** because we believe AI can do more than answer questions — it can help people heal.

The idea came from a simple but powerful question: *What if you could hear your loved one's voice again, not from a recording, but in a real conversation?* Not a chatbot pretending to be them, but an AI that truly understood how they spoke, what they cared about, and the warmth they carried.

## What it does

**missless** creates a virtual reunion experience. Users provide a YouTube video of someone they miss — a family vlog, an interview, a memorial video — and our AI analyzes the person's speech patterns, personality traits, and emotional expressions. It then builds a realistic persona and initiates a real-time voice conversation where users can talk to their loved one again.

During the reunion, the AI generates contextual scene images (a kitchen where you used to cook together, the park where you played as a child) and plays background music that matches the emotional tone of the conversation. The entire experience culminates in a shareable album — a keepsake of the reunion.

### Key Features
- **Voice-first interaction** — 100% voice-based, no text input required
- **YouTube video analysis** — Gemini 2.5 Pro analyzes personality directly from video URLs (zero-download)
- **30 HD preset voices** — Automatic voice matching based on analyzed characteristics
- **Progressive image generation** — Flash preview in 1-3 seconds, Imagen 4 HD final in 8-12 seconds
- **Real-time BGM** — Background music that adapts to conversation mood
- **Shareable album** — Reunion scenes compiled into a keepsake
Comment on lines +20 to +25

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The bullet points under "Key Features" use an inconsistent style. Consider using sentence case for all entries for better readability.

For example, change "- Voice-first interaction — 100% voice-based, no text input required" to "- Voice-first interaction — 100% voice-based, no text input required."


## How we built it

### Architecture
The browser (Next.js 15 PWA) serves as a pure renderer — all AI orchestration happens server-side in Go on Cloud Run. A persistent WebSocket carries bidirectional PCM audio between the browser and our Go backend, which proxies to Gemini's Live API.

### Gemini Models (4 models)
1. **Live API** (`gemini-2.5-flash-native-audio`) — Real-time voice conversation with native audio streaming
2. **Gemini 2.5 Pro** — YouTube video analysis for personality extraction
3. **Gemini 2.5 Flash Image** — Quick scene previews (1-3s)
4. **Imagen 4** — High-quality final scene images (8-12s)

### Server-Side Tools (5 tools)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align declared tool count with implemented Live tools

This section says there are 5 server-side tools, but the runtime config and dispatcher expose 6 tools (generate_scene, generate_fast_scene, change_atmosphere, recall_memory, analyze_user, end_reunion) in internal/session/manager.go (onboardingTools/reunionTools) and internal/live/tools.go (Handle switch). Leaving the DevPost text out of sync with the actual interface can misstate project scope during judging and makes future maintenance docs less reliable.

Useful? React with 👍 / 👎.

The Live API session uses Tool Calling to trigger server-side actions:
- `generate_scene` — 2-stage progressive image generation
- `change_atmosphere` — BGM selection and crossfade
- `recall_memory` — Search persona memories in Firestore for grounded conversation
- `analyze_user` — Flash Vision analysis of user input
- `end_reunion` — Compile scenes into a shareable album

### Tech Stack
- **Backend**: Go 1.25+ on Cloud Run

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider removing the + sign after the version number to avoid confusion, as it might be interpreted as a requirement for a specific minimum version instead of indicating that it is simply version 1.25 or later.

- **Frontend**: Next.js 15 (PWA, static export)
- **AI SDK**: google.golang.org/genai v1.47.0
- **Database**: Cloud Firestore (sessions, personas, memories)
- **Storage**: Cloud Storage (BGM presets, generated assets)
- **Auth**: Google OAuth 2.0 (YouTube access)

## Challenges we ran into

1. **Live API bidirectional streaming in Go** — The Gemini Live API's WebSocket protocol required careful goroutine management. We implemented a dual-channel proxy pattern where the browser WebSocket and Gemini WebSocket run independently with coordinated shutdown.

2. **Progressive image generation** — Users can't wait 8-12 seconds for an image during a live conversation. We solved this with a 2-stage pipeline: a Flash model generates a quick preview (1-3s) while Imagen 4 produces the final HD version in the background.

3. **GoAway signals and session resumption** — Live API sessions can be interrupted by GoAway signals. We implemented automatic reconnection with session resumption tokens to maintain conversation continuity.

4. **Goroutine safety** — With multiple concurrent operations (audio streaming, image generation, memory search), we enforced strict safety patterns: `SafeGo()` wrappers for panic recovery, 6-level lock ordering to prevent deadlocks, and mandatory race detector testing.

5. **Voice matching accuracy** — Mapping analyzed personality traits to the right voice from 30 presets required careful calibration of age, gender, tone, and emotional warmth parameters.

## Accomplishments that we're proud of

- **Zero-download video analysis** — Gemini 2.5 Pro analyzes YouTube videos directly from URLs without downloading, making the onboarding flow instant
- **Sub-3-second first image** — Progressive rendering ensures users see scene illustrations within seconds, not minutes
- **Production Go backend** — Strict safety patterns (SafeGo, lock ordering, race detector) make the concurrent system reliable under real-world conditions
- **Emotional impact** — Early testers described the experience as "genuinely moving" — our goal was healing through technology, and it works

## What we learned

- Gemini's Live API is remarkably capable for real-time voice applications, but managing the bidirectional streaming lifecycle (especially GoAway signals) requires robust engineering
- Tool Calling through the Live API enables powerful server-side orchestration — the AI naturally decides when to generate images, change music, or recall memories
- Progressive rendering is essential for real-time AI experiences — perceived speed matters more than actual speed
- The interleaved output capability (text + image from a single model call) creates uniquely cohesive narratives

## What's next for missless

- **Voice cloning** — Replace preset voices with actual voice synthesis from video analysis
- **Multi-person reunion** — Support conversations with multiple personas simultaneously
- **Lyria BGM** — Replace preset BGM files with real-time AI-generated background music when Go SDK support becomes available
- **Extended memory** — Cross-session memory so the AI remembers previous reunions
- **Mobile app** — Full native app with offline album viewing