Voice Agent Kit

A modular toolkit for adding a voice-powered AI copilot to any web application. Provides a complete voice pipeline — speech-to-text, voice activity detection, LLM reasoning with tool use, and text-to-speech — packaged as drop-in React components and Express handlers.

Built for government service portals (eRegistrations), but adaptable to any domain where users need guided, conversational assistance.

Packages

Package	Description
`@unctad-ai/voice-agent-core`	Hooks, types, and configuration for the voice pipeline (VAD, audio, state management)
`@unctad-ai/voice-agent-ui`	Glass-morphism UI components — floating panel, orb, waveform, onboarding, settings
`@unctad-ai/voice-agent-registries`	Dynamic registries for form fields, UI actions, and client-side tool handlers
`@unctad-ai/voice-agent-server`	Express route handlers for chat, STT, and TTS with pluggable providers and automatic fallback chains

All packages are published to npm under the @unctad-ai scope and versioned together.

Quick Start

Install

# Client
npm install @unctad-ai/voice-agent-core @unctad-ai/voice-agent-ui @unctad-ai/voice-agent-registries

# Server
npm install @unctad-ai/voice-agent-core @unctad-ai/voice-agent-server

# Peer dependencies (client)
npm install react react-dom motion lucide-react simplex-noise

Wire Up the Client

// voice-config.ts
import type { SiteConfig } from '@unctad-ai/voice-agent-core';

export const siteConfig: SiteConfig = {
  copilotName: 'Pesa',
  siteTitle: "Kenya's Business Gateway",
  farewellMessage: 'Feel free to come back anytime.',
  systemPromptIntro: 'You help investors navigate government services.',

  avatarUrl: '/avatar.png',
  colors: {
    primary: '#DB2129',
    processing: '#F59E0B',
    speaking: '#14B8A6',
    glow: '#f35f3f',
  },

  services: [/* your service catalog */],
  categories: [/* grouped categories */],
  synonyms: { tax: ['pin', 'vat', 'kra'] },
  categoryMap: { investor: 'Investor services' },
  routeMap: { home: '/', dashboard: '/dashboard' },
  getServiceFormRoute: (id) => `/dashboard/${id}`,
};

// App.tsx
import { lazy, Suspense, useState, useCallback } from 'react';
import { VoiceAgentProvider, VoiceOnboarding, VoiceA11yAnnouncer } from '@unctad-ai/voice-agent-ui';
import type { OrbState } from '@unctad-ai/voice-agent-core';
import { siteConfig } from './voice-config';

const GlassCopilotPanel = lazy(() =>
  import('@unctad-ai/voice-agent-ui').then(m => ({ default: m.GlassCopilotPanel }))
);

export default function App() {
  const [isOpen, setIsOpen] = useState(false);
  const [orbState, setOrbState] = useState<OrbState>('idle');

  return (
    <VoiceAgentProvider config={siteConfig}>
      {/* Your app routes here */}

      {!isOpen && <VoiceOnboarding onTryNow={() => setIsOpen(true)} />}
      <Suspense fallback={null}>
        <GlassCopilotPanel
          isOpen={isOpen}
          onOpen={() => setIsOpen(true)}
          onClose={() => setIsOpen(false)}
          onStateChange={setOrbState}
        />
      </Suspense>
      <VoiceA11yAnnouncer isOpen={isOpen} orbState={orbState} />
    </VoiceAgentProvider>
  );
}

Wire Up the Server

// server/index.ts
import express from 'express';
import { createVoiceRoutes } from '@unctad-ai/voice-agent-server';
import { siteConfig } from '../voice-config';

const app = express();
app.use(express.json());

const voice = createVoiceRoutes(siteConfig);
app.post('/api/chat', voice.chat);
app.post('/api/stt', voice.stt);
app.post('/api/tts', voice.tts);

app.listen(3001);

Architecture

flowchart LR
    subgraph BROWSER [" Browser "]
        direction TB
        A["🎤 Mic → VAD → STT"]
        B["GlassCopilotPanel"]
        C["TTS → 🔊 Speaker"]
        A -- transcript --> B
        B -- AI response --> C
        B -.- D["Registries\n(forms · navigation)"]
    end

    subgraph SERVER [" Server · Express "]
        direction TB
        S1["/api/stt"]
        S2["/api/chat"]
        S3["/api/tts"]
    end

    subgraph PROVIDERS [" Providers "]
        direction TB
        P1["Kyutai STT\n(self-hosted)"]
        P2["Groq API\n(cloud)"]
        P3["TTS Engine\n(self-hosted or cloud)"]
    end

    A -- "audio" --> S1
    S1 -- "text" --> A
    B -- "message" --> S2
    S2 -- "stream" --> B
    C -- "request" --> S3
    S3 -- "audio" --> C

    S1 --> P1
    S1 -.->|fallback| P2
    S2 --> P2
    S3 --> P3

Voice Pipeline

VAD — TenVAD runs in-browser via WebAssembly, detects when the user starts and stops speaking
STT — Audio sent to server for transcription (see providers below)
LLM — Transcript sent to Groq API with tool calling for search, navigation, form filling
TTS — LLM response streamed back as audio with barge-in support (see providers below)

Providers

Each stage of the pipeline uses a configurable provider set via environment variables.

STT (Speech-to-Text)

Provider	Type	Set via	Notes
Kyutai	Self-hosted	`STT_PROVIDER=kyutai`	Default. Runs as a sidecar container. Falls back to Groq Whisper on failure
Groq Whisper	Cloud API	`STT_PROVIDER=groq`	Uses `whisper-large-v3-turbo` via Groq API. Also serves as automatic fallback for Kyutai

LLM (Chat)

Provider	Type	Set via	Notes
Groq	Cloud API	`GROQ_API_KEY`	Default model: `openai/gpt-oss-120b`. Override with `GROQ_MODEL`

TTS (Text-to-Speech)

Set with TTS_PROVIDER. Each self-hosted provider falls back to Pocket TTS, then to Resemble (cloud).

Provider	Type	Set via	Notes
Qwen3-TTS	Self-hosted (GPU)	`TTS_PROVIDER=qwen3-tts`	Token-level streaming, ~200ms TTFA. Requires GPU server
Chatterbox Turbo	Self-hosted (GPU)	`TTS_PROVIDER=chatterbox-turbo`	Sentence-level pipelining. Requires GPU server
CosyVoice	Self-hosted (GPU)	`TTS_PROVIDER=cosyvoice`	Alibaba's voice synthesis. Requires GPU server
Pocket TTS	Self-hosted (CPU)	`TTS_PROVIDER=pocket-tts`	Runs on CPU at ~0.5x realtime. Deployed as Docker sidecar. Middle fallback for GPU providers
Resemble	Cloud API	`TTS_PROVIDER=resemble`	Resemble AI streaming API. Last-resort fallback for all other providers

Fallback Chains

qwen3-tts → pocket-tts → resemble
chatterbox-turbo → pocket-tts → resemble
cosyvoice → pocket-tts → resemble
pocket-tts → resemble
resemble (no fallback)

Registries

Registries let consuming apps dynamically expose their UI to the voice agent.

Form Fields

import { useRegisterFormField } from '@unctad-ai/voice-agent-registries';

function CompanyNameInput() {
  const [value, setValue] = useState('');

  useRegisterFormField({
    id: 'companyName',
    label: 'Company Name',
    type: 'text',
    required: true,
    setter: setValue,
    group: 'company-details',
  });

  return <input value={value} onChange={e => setValue(e.target.value)} />;
}

The voice agent can now fill this field via the fillFormFields tool.

UI Actions

import { useRegisterUIAction } from '@unctad-ai/voice-agent-registries';

function Dashboard() {
  useRegisterUIAction({
    id: 'openSettings',
    label: 'Open Settings',
    category: 'navigation',
    handler: () => navigate('/settings'),
  });

  return <div>...</div>;
}

UI Components

Component	Purpose
`VoiceAgentProvider`	Context provider — wraps your app with SiteConfig
`GlassCopilotPanel`	Main floating panel (392px, glass morphism, collapsed/expanded)
`VoiceOnboarding`	First-time user prompt to try the voice agent
`VoiceA11yAnnouncer`	Screen reader live region for state changes
`AgentAvatar`	Copilot portrait with state-based visual effects
`VoiceOrb`	Animated speaking/processing indicator
`VoiceWaveformCanvas`	Real-time audio waveform visualization
`VoiceControls`	Mic, stop, volume controls
`VoiceSettingsView`	User preferences (volume, speed, auto-listen, timeouts)
`VoiceToolCard`	Displays tool execution results inline
`VoiceTranscript`	Conversation transcript display
`VoiceErrorBoundary`	Error boundary with recovery UI

Configuration

SiteConfig

interface SiteConfig {
  // Identity
  copilotName: string;          // Display name ("Pesa", "Tashi")
  siteTitle: string;            // Site name shown in UI
  farewellMessage: string;      // Said when closing session
  systemPromptIntro: string;    // LLM system prompt prefix

  // Branding
  colors: {
    primary: string;            // Main accent color
    processing: string;         // Shown during STT/thinking
    speaking: string;           // Shown during TTS playback
    glow: string;               // Orb glow effect
    error?: string;             // Error state
  };

  // Domain data
  services: ServiceBase[];      // Searchable service catalog
  categories: CategoryBase[];   // Grouped for browsing
  synonyms: Record<string, string[]>;  // Fuzzy search mappings
  categoryMap: Record<string, string>; // Category aliases
  routeMap: Record<string, string>;    // Named routes
  getServiceFormRoute: (serviceId: string) => string | null;

  // Optional
  avatarUrl?: string;
  extraServerTools?: Record<string, unknown>;
  thresholdOverrides?: Partial<VoiceThresholds>;
}

Voice Thresholds

Fine-tune VAD sensitivity via thresholdOverrides:

{
  positiveSpeechThreshold: 0.8,   // Confidence to start recording
  negativeSpeechThreshold: 0.4,   // Confidence to stop
  minSpeechFrames: 5,             // Min frames before accepting
  redemptionFrames: 15,           // ~600ms grace period
  minAudioRms: 0.005,             // Minimum volume level
}

Development

Prerequisites

Node.js 22+
pnpm 10+

Setup

git clone https://github.com/unctad-ai/voice-agent-kit.git
cd voice-agent-kit
pnpm install

Commands

pnpm dev          # Watch mode (all packages)
pnpm build        # Build all packages
pnpm typecheck    # Type-check all packages

Project Structure

voice-agent-kit/
├── packages/
│   ├── core/           # Hooks, types, config, audio utilities
│   ├── ui/             # React components (tsup build)
│   ├── registries/     # Form/UI action registries
│   └── server/         # Express handlers (chat, STT, TTS)
├── scripts/
│   └── validate-release.sh
├── .changeset/         # Version management
├── .github/workflows/
│   ├── ci.yml          # Typecheck + validate on push/PR
│   └── publish.yml     # Publish to npm on v* tags
└── .husky/
    └── pre-commit      # Typecheck gate

Release

pnpm changeset        # Describe what changed (interactive)
git add . && git commit -m "chore: add changeset"

pnpm release          # Bumps versions + validates (clean build, dist check, dry-run publish)
git add . && git commit -m "chore: release vX.Y.Z"
git tag vX.Y.Z
git push --follow-tags && git push origin vX.Y.Z
# CI publishes to npm automatically

All four packages use fixed versioning — they always share the same version number.

License

ISC

Name		Name	Last commit message	Last commit date
Latest commit History 462 Commits
.changeset		.changeset
.claude		.claude
.github		.github
.husky		.husky
autotune		autotune
docs		docs
packages		packages
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
INVESTIGATION-2026-03-10.md		INVESTIGATION-2026-03-10.md
README.md		README.md
eslint.config.js		eslint.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.base.json		tsconfig.base.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Agent Kit

Packages

Quick Start

Install

Wire Up the Client

Wire Up the Server

Architecture

Voice Pipeline

Providers

STT (Speech-to-Text)

LLM (Chat)

TTS (Text-to-Speech)

Fallback Chains

Registries

Form Fields

UI Actions

UI Components

Configuration

SiteConfig

Voice Thresholds

Development

Prerequisites

Setup

Commands

Project Structure

Release

License

About

Uh oh!

Releases 53

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Agent Kit

Packages

Quick Start

Install

Wire Up the Client

Wire Up the Server

Architecture

Voice Pipeline

Providers

STT (Speech-to-Text)

LLM (Chat)

TTS (Text-to-Speech)

Fallback Chains

Registries

Form Fields

UI Actions

UI Components

Configuration

SiteConfig

Voice Thresholds

Development

Prerequisites

Setup

Commands

Project Structure

Release

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 53

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages