Skip to content

deepgram-devs/pizza-voice-agent-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pizza Palace — Voice Ordering Workshop

A hands-on workshop for learning Deepgram's Voice Agent API. Order pizza by talking to an AI agent powered by real-time speech-to-text, an LLM, and text-to-speech — all over a single WebSocket.

Prerequisites

  • Node.js 18+download
  • Deepgram API keyget a free key
  • A modern browser (Chrome, Firefox, Edge, Safari)
  • A working microphone

Demo

Pizza.Palace.Demo.mov

Quick Start

# 1. Install dependencies
npm install

# 2. (Optional) Add your API key to .env so you don't have to enter it in the UI
echo "DEEPGRAM_API_KEY=your_key_here" > .env

# 3. Start the server
npm start

Open http://localhost:3000 in your browser.

If you added a .env key, just click Connect. Otherwise, paste your API key into the input field first.

The microphone starts automatically when you connect. Talk to order, click the button to mute/unmute, and click End to disconnect.

How It Works

Browser (mic) ──audio──▶ Express server ──proxy──▶ Deepgram Voice Agent
Browser (speaker) ◀──audio── Express server ◀──proxy── Deepgram Voice Agent

The Express server proxies WebSocket connections to Deepgram so the API key never reaches the browser. All audio and JSON messages pass through unchanged.

Pipeline:

Stage Model Role
Listen (STT) Flux (flux-general-en v2) Transcribes your speech
Think (LLM) GPT-4o-mini Decides what to say, calls functions
Speak (TTS) Aura-2 (aura-2-thalia-en) Speaks the response

Three functions are registered with the agent:

Function Description Status
getMenuItems() Returns the full menu with prices Implemented
addToOrder(item, quantity) Adds items to the order, updates the UI Implemented
removeFromOrder(item) Removes items from the order Skeleton (workshop exercise)

Project Structure

├── server.js              # Express + WebSocket proxy to Deepgram
├── package.json
├── .env                   # Optional: DEEPGRAM_API_KEY (gitignored)
├── public/
│   ├── index.html         # Single-page UI
│   ├── css/styles.css     # Pizza-themed styling
│   └── js/
│       ├── app.js         # Entry point, wires modules together
│       ├── audio.js       # Mic capture + agent audio playback
│       ├── deepgram.js    # WebSocket connection, Settings, function dispatch
│       ├── functions.js   # Menu data, order state, function handlers
│       └── ui.js          # DOM rendering
├── EXTENSIONS.md          # Workshop expansion ideas
└── RESEARCH.md            # Deepgram Voice Agent API research notes

Try Saying

  • "What's on the menu?"
  • "I'll have two pepperoni pizzas and a Caesar salad"
  • "Add a lemonade"
  • "What's my total?"
  • "Remove the salad" (requires completing the workshop exercise)

Workshop Exercises

See EXTENSIONS.md for 7 expansion ideas ranging from beginner to advanced:

  1. Implement removeFromOrder() — Complete the skeleton function
  2. Voice picker — Change the agent's voice mid-conversation
  3. Order summary function — Add a new callable function
  4. Live transcript panel — Display the conversation as chat bubbles
  5. Delivery or pickup — Add address collection and fulfillment method
  6. Order status tracker — Time-based progress from kitchen to delivery
  7. Dynamic upselling — Use prompt engineering to suggest pairings

Troubleshooting

Problem Fix
No audio / static noise Check that your mic is set to the correct input device in system settings
"WebSocket error" on connect Verify your API key is valid at console.deepgram.com
Agent doesn't respond Button should show green "Listening" — if muted (red), click to unmute
Port 3000 in use Kill the other process or set PORT=3001 npm start
Browser blocks mic Must be on localhost or HTTPS — file:// won't work

Security

This project is designed for local workshop use only. Do not deploy to a public network without adding:

  • Authentication on the WebSocket proxy endpoint
  • Rate limiting to prevent API credit abuse
  • TLS (HTTPS/WSS) — API keys sent via the browser input travel as WebSocket query parameters in cleartext over ws://

The recommended approach is to set your API key in .env (server-side) rather than entering it in the browser. See .env.example for the expected format.

About

Hands-on workshop: build a pizza ordering voice agent with Deepgram's Voice Agent API

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors