A modular toolkit for adding a voice-powered AI copilot to any web application. Provides a complete voice pipeline — speech-to-text, voice activity detection, LLM reasoning with tool use, and text-to-speech — packaged as drop-in React components and Express handlers.
Built for government service portals (eRegistrations), but adaptable to any domain where users need guided, conversational assistance.
| Package | Description |
|---|---|
@unctad-ai/voice-agent-core |
Hooks, types, and configuration for the voice pipeline (VAD, audio, state management) |
@unctad-ai/voice-agent-ui |
Glass-morphism UI components — floating panel, orb, waveform, onboarding, settings |
@unctad-ai/voice-agent-registries |
Dynamic registries for form fields, UI actions, and client-side tool handlers |
@unctad-ai/voice-agent-server |
Express route handlers for chat, STT, and TTS with pluggable providers and automatic fallback chains |
All packages are published to npm under the @unctad-ai scope and versioned together.
# Client
npm install @unctad-ai/voice-agent-core @unctad-ai/voice-agent-ui @unctad-ai/voice-agent-registries
# Server
npm install @unctad-ai/voice-agent-core @unctad-ai/voice-agent-server
# Peer dependencies (client)
npm install react react-dom motion lucide-react simplex-noise// voice-config.ts
import type { SiteConfig } from '@unctad-ai/voice-agent-core';
export const siteConfig: SiteConfig = {
copilotName: 'Pesa',
siteTitle: "Kenya's Business Gateway",
farewellMessage: 'Feel free to come back anytime.',
systemPromptIntro: 'You help investors navigate government services.',
avatarUrl: '/avatar.png',
colors: {
primary: '#DB2129',
processing: '#F59E0B',
speaking: '#14B8A6',
glow: '#f35f3f',
},
services: [/* your service catalog */],
categories: [/* grouped categories */],
synonyms: { tax: ['pin', 'vat', 'kra'] },
categoryMap: { investor: 'Investor services' },
routeMap: { home: '/', dashboard: '/dashboard' },
getServiceFormRoute: (id) => `/dashboard/${id}`,
};// App.tsx
import { lazy, Suspense, useState, useCallback } from 'react';
import { VoiceAgentProvider, VoiceOnboarding, VoiceA11yAnnouncer } from '@unctad-ai/voice-agent-ui';
import type { OrbState } from '@unctad-ai/voice-agent-core';
import { siteConfig } from './voice-config';
const GlassCopilotPanel = lazy(() =>
import('@unctad-ai/voice-agent-ui').then(m => ({ default: m.GlassCopilotPanel }))
);
export default function App() {
const [isOpen, setIsOpen] = useState(false);
const [orbState, setOrbState] = useState<OrbState>('idle');
return (
<VoiceAgentProvider config={siteConfig}>
{/* Your app routes here */}
{!isOpen && <VoiceOnboarding onTryNow={() => setIsOpen(true)} />}
<Suspense fallback={null}>
<GlassCopilotPanel
isOpen={isOpen}
onOpen={() => setIsOpen(true)}
onClose={() => setIsOpen(false)}
onStateChange={setOrbState}
/>
</Suspense>
<VoiceA11yAnnouncer isOpen={isOpen} orbState={orbState} />
</VoiceAgentProvider>
);
}// server/index.ts
import express from 'express';
import { createVoiceRoutes } from '@unctad-ai/voice-agent-server';
import { siteConfig } from '../voice-config';
const app = express();
app.use(express.json());
const voice = createVoiceRoutes(siteConfig);
app.post('/api/chat', voice.chat);
app.post('/api/stt', voice.stt);
app.post('/api/tts', voice.tts);
app.listen(3001);flowchart LR
subgraph BROWSER [" Browser "]
direction TB
A["🎤 Mic → VAD → STT"]
B["GlassCopilotPanel"]
C["TTS → 🔊 Speaker"]
A -- transcript --> B
B -- AI response --> C
B -.- D["Registries\n(forms · navigation)"]
end
subgraph SERVER [" Server · Express "]
direction TB
S1["/api/stt"]
S2["/api/chat"]
S3["/api/tts"]
end
subgraph PROVIDERS [" Providers "]
direction TB
P1["Kyutai STT\n(self-hosted)"]
P2["Groq API\n(cloud)"]
P3["TTS Engine\n(self-hosted or cloud)"]
end
A -- "audio" --> S1
S1 -- "text" --> A
B -- "message" --> S2
S2 -- "stream" --> B
C -- "request" --> S3
S3 -- "audio" --> C
S1 --> P1
S1 -.->|fallback| P2
S2 --> P2
S3 --> P3
- VAD — TenVAD runs in-browser via WebAssembly, detects when the user starts and stops speaking
- STT — Audio sent to server for transcription (see providers below)
- LLM — Transcript sent to Groq API with tool calling for search, navigation, form filling
- TTS — LLM response streamed back as audio with barge-in support (see providers below)
Each stage of the pipeline uses a configurable provider set via environment variables.
| Provider | Type | Set via | Notes |
|---|---|---|---|
| Kyutai | Self-hosted | STT_PROVIDER=kyutai |
Default. Runs as a sidecar container. Falls back to Groq Whisper on failure |
| Groq Whisper | Cloud API | STT_PROVIDER=groq |
Uses whisper-large-v3-turbo via Groq API. Also serves as automatic fallback for Kyutai |
| Provider | Type | Set via | Notes |
|---|---|---|---|
| Groq | Cloud API | GROQ_API_KEY |
Default model: openai/gpt-oss-120b. Override with GROQ_MODEL |
Set with TTS_PROVIDER. Each self-hosted provider falls back to Pocket TTS, then to Resemble (cloud).
| Provider | Type | Set via | Notes |
|---|---|---|---|
| Qwen3-TTS | Self-hosted (GPU) | TTS_PROVIDER=qwen3-tts |
Token-level streaming, ~200ms TTFA. Requires GPU server |
| Chatterbox Turbo | Self-hosted (GPU) | TTS_PROVIDER=chatterbox-turbo |
Sentence-level pipelining. Requires GPU server |
| CosyVoice | Self-hosted (GPU) | TTS_PROVIDER=cosyvoice |
Alibaba's voice synthesis. Requires GPU server |
| Pocket TTS | Self-hosted (CPU) | TTS_PROVIDER=pocket-tts |
Runs on CPU at ~0.5x realtime. Deployed as Docker sidecar. Middle fallback for GPU providers |
| Resemble | Cloud API | TTS_PROVIDER=resemble |
Resemble AI streaming API. Last-resort fallback for all other providers |
qwen3-tts → pocket-tts → resemble
chatterbox-turbo → pocket-tts → resemble
cosyvoice → pocket-tts → resemble
pocket-tts → resemble
resemble (no fallback)
Registries let consuming apps dynamically expose their UI to the voice agent.
import { useRegisterFormField } from '@unctad-ai/voice-agent-registries';
function CompanyNameInput() {
const [value, setValue] = useState('');
useRegisterFormField({
id: 'companyName',
label: 'Company Name',
type: 'text',
required: true,
setter: setValue,
group: 'company-details',
});
return <input value={value} onChange={e => setValue(e.target.value)} />;
}The voice agent can now fill this field via the fillFormFields tool.
import { useRegisterUIAction } from '@unctad-ai/voice-agent-registries';
function Dashboard() {
useRegisterUIAction({
id: 'openSettings',
label: 'Open Settings',
category: 'navigation',
handler: () => navigate('/settings'),
});
return <div>...</div>;
}| Component | Purpose |
|---|---|
VoiceAgentProvider |
Context provider — wraps your app with SiteConfig |
GlassCopilotPanel |
Main floating panel (392px, glass morphism, collapsed/expanded) |
VoiceOnboarding |
First-time user prompt to try the voice agent |
VoiceA11yAnnouncer |
Screen reader live region for state changes |
AgentAvatar |
Copilot portrait with state-based visual effects |
VoiceOrb |
Animated speaking/processing indicator |
VoiceWaveformCanvas |
Real-time audio waveform visualization |
VoiceControls |
Mic, stop, volume controls |
VoiceSettingsView |
User preferences (volume, speed, auto-listen, timeouts) |
VoiceToolCard |
Displays tool execution results inline |
VoiceTranscript |
Conversation transcript display |
VoiceErrorBoundary |
Error boundary with recovery UI |
interface SiteConfig {
// Identity
copilotName: string; // Display name ("Pesa", "Tashi")
siteTitle: string; // Site name shown in UI
farewellMessage: string; // Said when closing session
systemPromptIntro: string; // LLM system prompt prefix
// Branding
colors: {
primary: string; // Main accent color
processing: string; // Shown during STT/thinking
speaking: string; // Shown during TTS playback
glow: string; // Orb glow effect
error?: string; // Error state
};
// Domain data
services: ServiceBase[]; // Searchable service catalog
categories: CategoryBase[]; // Grouped for browsing
synonyms: Record<string, string[]>; // Fuzzy search mappings
categoryMap: Record<string, string>; // Category aliases
routeMap: Record<string, string>; // Named routes
getServiceFormRoute: (serviceId: string) => string | null;
// Optional
avatarUrl?: string;
extraServerTools?: Record<string, unknown>;
thresholdOverrides?: Partial<VoiceThresholds>;
}Fine-tune VAD sensitivity via thresholdOverrides:
{
positiveSpeechThreshold: 0.8, // Confidence to start recording
negativeSpeechThreshold: 0.4, // Confidence to stop
minSpeechFrames: 5, // Min frames before accepting
redemptionFrames: 15, // ~600ms grace period
minAudioRms: 0.005, // Minimum volume level
}- Node.js 22+
- pnpm 10+
git clone https://github.com/unctad-ai/voice-agent-kit.git
cd voice-agent-kit
pnpm installpnpm dev # Watch mode (all packages)
pnpm build # Build all packages
pnpm typecheck # Type-check all packagesvoice-agent-kit/
├── packages/
│ ├── core/ # Hooks, types, config, audio utilities
│ ├── ui/ # React components (tsup build)
│ ├── registries/ # Form/UI action registries
│ └── server/ # Express handlers (chat, STT, TTS)
├── scripts/
│ └── validate-release.sh
├── .changeset/ # Version management
├── .github/workflows/
│ ├── ci.yml # Typecheck + validate on push/PR
│ └── publish.yml # Publish to npm on v* tags
└── .husky/
└── pre-commit # Typecheck gate
pnpm changeset # Describe what changed (interactive)
git add . && git commit -m "chore: add changeset"
pnpm release # Bumps versions + validates (clean build, dist check, dry-run publish)
git add . && git commit -m "chore: release vX.Y.Z"
git tag vX.Y.Z
git push --follow-tags && git push origin vX.Y.Z
# CI publishes to npm automaticallyAll four packages use fixed versioning — they always share the same version number.
ISC