A hands-on workshop for learning Deepgram's Voice Agent API. Order pizza by talking to an AI agent powered by real-time speech-to-text, an LLM, and text-to-speech — all over a single WebSocket.
- Node.js 18+ — download
- Deepgram API key — get a free key
- A modern browser (Chrome, Firefox, Edge, Safari)
- A working microphone
Pizza.Palace.Demo.mov
# 1. Install dependencies
npm install
# 2. (Optional) Add your API key to .env so you don't have to enter it in the UI
echo "DEEPGRAM_API_KEY=your_key_here" > .env
# 3. Start the server
npm startOpen http://localhost:3000 in your browser.
If you added a .env key, just click Connect. Otherwise, paste your API key into the input field first.
The microphone starts automatically when you connect. Talk to order, click the button to mute/unmute, and click End to disconnect.
Browser (mic) ──audio──▶ Express server ──proxy──▶ Deepgram Voice Agent
Browser (speaker) ◀──audio── Express server ◀──proxy── Deepgram Voice Agent
The Express server proxies WebSocket connections to Deepgram so the API key never reaches the browser. All audio and JSON messages pass through unchanged.
Pipeline:
| Stage | Model | Role |
|---|---|---|
| Listen (STT) | Flux (flux-general-en v2) |
Transcribes your speech |
| Think (LLM) | GPT-4o-mini | Decides what to say, calls functions |
| Speak (TTS) | Aura-2 (aura-2-thalia-en) |
Speaks the response |
Three functions are registered with the agent:
| Function | Description | Status |
|---|---|---|
getMenuItems() |
Returns the full menu with prices | Implemented |
addToOrder(item, quantity) |
Adds items to the order, updates the UI | Implemented |
removeFromOrder(item) |
Removes items from the order | Skeleton (workshop exercise) |
├── server.js # Express + WebSocket proxy to Deepgram
├── package.json
├── .env # Optional: DEEPGRAM_API_KEY (gitignored)
├── public/
│ ├── index.html # Single-page UI
│ ├── css/styles.css # Pizza-themed styling
│ └── js/
│ ├── app.js # Entry point, wires modules together
│ ├── audio.js # Mic capture + agent audio playback
│ ├── deepgram.js # WebSocket connection, Settings, function dispatch
│ ├── functions.js # Menu data, order state, function handlers
│ └── ui.js # DOM rendering
├── EXTENSIONS.md # Workshop expansion ideas
└── RESEARCH.md # Deepgram Voice Agent API research notes
- "What's on the menu?"
- "I'll have two pepperoni pizzas and a Caesar salad"
- "Add a lemonade"
- "What's my total?"
- "Remove the salad" (requires completing the workshop exercise)
See EXTENSIONS.md for 7 expansion ideas ranging from beginner to advanced:
- Implement
removeFromOrder()— Complete the skeleton function - Voice picker — Change the agent's voice mid-conversation
- Order summary function — Add a new callable function
- Live transcript panel — Display the conversation as chat bubbles
- Delivery or pickup — Add address collection and fulfillment method
- Order status tracker — Time-based progress from kitchen to delivery
- Dynamic upselling — Use prompt engineering to suggest pairings
| Problem | Fix |
|---|---|
| No audio / static noise | Check that your mic is set to the correct input device in system settings |
| "WebSocket error" on connect | Verify your API key is valid at console.deepgram.com |
| Agent doesn't respond | Button should show green "Listening" — if muted (red), click to unmute |
| Port 3000 in use | Kill the other process or set PORT=3001 npm start |
| Browser blocks mic | Must be on localhost or HTTPS — file:// won't work |
This project is designed for local workshop use only. Do not deploy to a public network without adding:
- Authentication on the WebSocket proxy endpoint
- Rate limiting to prevent API credit abuse
- TLS (HTTPS/WSS) — API keys sent via the browser input travel as WebSocket query parameters in cleartext over
ws://
The recommended approach is to set your API key in .env (server-side) rather than entering it in the browser. See .env.example for the expected format.