From 95c752b9369a087506fc39fec6596d34f9a8e1c4 Mon Sep 17 00:00:00 2001
From: ComBba <cent@naver.com>
Date: Wed, 25 Feb 2026 18:36:01 +0900
Subject: [PATCH] docs: add DevPost submission text in English

Complete 7-section DevPost submission covering:
- Inspiration, What it does, How we built it
- Challenges, Accomplishments, What we learned, What's next
- Highlights Gemini Live API, 4 models, 5 server-side tools
- Emphasizes progressive rendering and voice-first UX

Closes #124

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/devpost-submission.md | 86 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 86 insertions(+)
 create mode 100644 docs/devpost-submission.md

diff --git a/docs/devpost-submission.md b/docs/devpost-submission.md
new file mode 100644
index 0000000..eb8de9c
--- /dev/null
+++ b/docs/devpost-submission.md
@@ -0,0 +1,86 @@
+# missless — DevPost Submission Text
+
+> For the Gemini Live Agent Challenge 2026 (Creative Storyteller Track)
+
+---
+
+## Inspiration
+
+We all carry someone in our hearts — a grandparent who passed away before we could say goodbye, a parent we lost touch with, a friend who moved to the other side of the world. We built **missless** because we believe AI can do more than answer questions — it can help people heal.
+
+The idea came from a simple but powerful question: *What if you could hear your loved one's voice again, not from a recording, but in a real conversation?* Not a chatbot pretending to be them, but an AI that truly understood how they spoke, what they cared about, and the warmth they carried.
+
+## What it does
+
+**missless** creates a virtual reunion experience. Users provide a YouTube video of someone they miss — a family vlog, an interview, a memorial video — and our AI analyzes the person's speech patterns, personality traits, and emotional expressions. It then builds a realistic persona and initiates a real-time voice conversation where users can talk to their loved one again.
+
+During the reunion, the AI generates contextual scene images (a kitchen where you used to cook together, the park where you played as a child) and plays background music that matches the emotional tone of the conversation. The entire experience culminates in a shareable album — a keepsake of the reunion.
+
+### Key Features
+- **Voice-first interaction** — 100% voice-based, no text input required
+- **YouTube video analysis** — Gemini 2.5 Pro analyzes personality directly from video URLs (zero-download)
+- **30 HD preset voices** — Automatic voice matching based on analyzed characteristics
+- **Progressive image generation** — Flash preview in 1-3 seconds, Imagen 4 HD final in 8-12 seconds
+- **Real-time BGM** — Background music that adapts to conversation mood
+- **Shareable album** — Reunion scenes compiled into a keepsake
+
+## How we built it
+
+### Architecture
+The browser (Next.js 15 PWA) serves as a pure renderer — all AI orchestration happens server-side in Go on Cloud Run. A persistent WebSocket carries bidirectional PCM audio between the browser and our Go backend, which proxies to Gemini's Live API.
+
+### Gemini Models (4 models)
+1. **Live API** (`gemini-2.5-flash-native-audio`) — Real-time voice conversation with native audio streaming
+2. **Gemini 2.5 Pro** — YouTube video analysis for personality extraction
+3. **Gemini 2.5 Flash Image** — Quick scene previews (1-3s)
+4. **Imagen 4** — High-quality final scene images (8-12s)
+
+### Server-Side Tools (5 tools)
+The Live API session uses Tool Calling to trigger server-side actions:
+- `generate_scene` — 2-stage progressive image generation
+- `change_atmosphere` — BGM selection and crossfade
+- `recall_memory` — Search persona memories in Firestore for grounded conversation
+- `analyze_user` — Flash Vision analysis of user input
+- `end_reunion` — Compile scenes into a shareable album
+
+### Tech Stack
+- **Backend**: Go 1.25+ on Cloud Run
+- **Frontend**: Next.js 15 (PWA, static export)
+- **AI SDK**: google.golang.org/genai v1.47.0
+- **Database**: Cloud Firestore (sessions, personas, memories)
+- **Storage**: Cloud Storage (BGM presets, generated assets)
+- **Auth**: Google OAuth 2.0 (YouTube access)
+
+## Challenges we ran into
+
+1. **Live API bidirectional streaming in Go** — The Gemini Live API's WebSocket protocol required careful goroutine management. We implemented a dual-channel proxy pattern where the browser WebSocket and Gemini WebSocket run independently with coordinated shutdown.
+
+2. **Progressive image generation** — Users can't wait 8-12 seconds for an image during a live conversation. We solved this with a 2-stage pipeline: a Flash model generates a quick preview (1-3s) while Imagen 4 produces the final HD version in the background.
+
+3. **GoAway signals and session resumption** — Live API sessions can be interrupted by GoAway signals. We implemented automatic reconnection with session resumption tokens to maintain conversation continuity.
+
+4. **Goroutine safety** — With multiple concurrent operations (audio streaming, image generation, memory search), we enforced strict safety patterns: `SafeGo()` wrappers for panic recovery, 6-level lock ordering to prevent deadlocks, and mandatory race detector testing.
+
+5. **Voice matching accuracy** — Mapping analyzed personality traits to the right voice from 30 presets required careful calibration of age, gender, tone, and emotional warmth parameters.
+
+## Accomplishments that we're proud of
+
+- **Zero-download video analysis** — Gemini 2.5 Pro analyzes YouTube videos directly from URLs without downloading, making the onboarding flow instant
+- **Sub-3-second first image** — Progressive rendering ensures users see scene illustrations within seconds, not minutes
+- **Production Go backend** — Strict safety patterns (SafeGo, lock ordering, race detector) make the concurrent system reliable under real-world conditions
+- **Emotional impact** — Early testers described the experience as "genuinely moving" — our goal was healing through technology, and it works
+
+## What we learned
+
+- Gemini's Live API is remarkably capable for real-time voice applications, but managing the bidirectional streaming lifecycle (especially GoAway signals) requires robust engineering
+- Tool Calling through the Live API enables powerful server-side orchestration — the AI naturally decides when to generate images, change music, or recall memories
+- Progressive rendering is essential for real-time AI experiences — perceived speed matters more than actual speed
+- The interleaved output capability (text + image from a single model call) creates uniquely cohesive narratives
+
+## What's next for missless
+
+- **Voice cloning** — Replace preset voices with actual voice synthesis from video analysis
+- **Multi-person reunion** — Support conversations with multiple personas simultaneously
+- **Lyria BGM** — Replace preset BGM files with real-time AI-generated background music when Go SDK support becomes available
+- **Extended memory** — Cross-session memory so the AI remembers previous reunions
+- **Mobile app** — Full native app with offline album viewing