Vaani is a configuration-driven voice-AI framework built on top of the LiveKit Agents SDK.
Instead of hard-coding every interaction, you describe how your agent should behave in a single JSON file. Vaani loads that file, connects the dots (LLM, TTS, STT, VAD, RAG, tools, memory, etc.) and spins up a fully-working, production-ready voice agent.
- Zero-code configuration – change behaviour by editing JSON, no redeploys needed.
- Pluggable providers – OpenAI, Deepgram, ElevenLabs, Sarvam, Pinecone & more.
- Built-in tool system – extend the agent with Python async functions, exposed to the LLM.
- Realtime turn detection – Silero VAD + optional LiveKit EOU model.
- RAG ready – drop-in Pinecone + LlamaIndex retrieval pipeline.
- Conversation memory – opt-in long-term memory backed by Mem0.
- Streamlit config builder – point-and-click wizard to generate valid configs.
- Works everywhere – macOS, Linux, Windows, Cloud, Docker.
- Python ≥ 3.9
- A LiveKit Cloud project or self-hosted LiveKit server
- Livekit API keys
- API keys for the providers you plan to use (OpenAI, Deepgram, Pinecone, …)
- (Optional)
ffmpeginstalled locally for audio transcoding
# 1. clone & cd
$ git clone https://github.com/your-org/vaani.git
$ cd vaani
# 2. create & activate venv
$ python -m venv .venv
$ source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. install in editable mode (+ extras)
$ pip install -e .[dev]
# 4. copy env template & add your keys
$ cp .env.example .envDownload the files
python main.py download-filesRun the default assistant in dev mode:
$ python main.py devThe agent registers with LiveKit under the ID defined in main.py and is ready to accept calls.
Each agent is described by a single JSON conforming to universalagent.core.AgentConfig.
Below is a minimal but complete example:
{
"agent_id": "demo_assistant",
"name": "Demo Assistant",
"description": "General purpose helper",
"llm_config": {
"provider": "openai",
"model": "gpt-4o-mini",
"temperature": 0.7
},
"tts_config": { "provider": "deepgram" },
"stt_config": { "provider": "deepgram", "language": "en" },
"max_conversation_duration": 900,
"silence_timeout": 15,
"tools": ["end_call"],
"rag_config": { "enabled": false },
"memory_config": { "enabled": false }
}Key sections:
| Section | Purpose |
|---|---|
agent_id |
Unique dispatch key used by LiveKit |
llm_config / tts_config / stt_config |
Chooses providers & models |
rag_config |
Toggles knowledge-retrieval pipeline |
memory_config |
Enables per-user memory & sets limits |
tools |
List of tool names exposed to the LLM |
max_conversation_duration |
Auto-hangup after N seconds |
silence_timeout |
Prompt the user after N seconds of silence |
Environment variables (see .env.example) hold the API keys so you can share config JSON without secrets.
The framework ships with a small but useful set of always-available tools:
| Tool | Description |
|---|---|
end_call |
Gracefully terminates the current LiveKit room & cleans up |
You can add your own by wrapping an async def with ToolHolder.
When rag_config.enabled == true Vaani automatically registers search_knowledge_base powered by LlamaIndexPineconeRagTool. Just drop in your namepsace
Usage (inside the LLM):
#tool search_knowledge_base("what is the refund policy?")
The tool returns relevant chunks which are then weaved into the assistant response.
Enabling memory_config.enabled adds two tools:
| Tool | Mode | Purpose |
|---|---|---|
get_memory |
sync | Retrieve previous memories relevant to a query |
store_important_info |
fire-and-forget | Persist information the user reveals during the call |
When memory is enabled, at the end of the call the transcription is used to create both agent and user memories.
Memories are stored in Mem0 and automatically summarised when they exceed summarize_threshold.
example/configs/default.json– simple assistantexample/configs/clinic_receptionist.json– appointment scheduling agentexample/configs/survey_agent.json– multi-question NPS survey withrecord_answertool
Copy one of these, tweak, then launch with: Start and Register the worker with livekit server
python main.py devDispatch the agent:
lk dispatch create \
--agent-name base_agent \ # This should match the agent name in worker options
--metadata '{
"agent_id": "clinic_receptionist",
"call_id": "xyz-call-id",
"customer_name": "",
"customer_id": "xyz-cust-id", # this will be used to create memories about the caller
"phone_number": "",
}'A Streamlit app that lets non-developers design agents via a 4-step wizard, complete with real-time validation and provider auto-detection.
$ cd configbuilder/streamlit
$ pip install -r requirements.txt
$ streamlit run agent_config_builder.pyThe generated JSON can be saved directly into the example/configs folder.
Pull requests are welcome! Please:
- Fork & create a branch (
feature/my-feature) - Submit the PR and fill in the template
This project is released under the MIT License – see LICENSE for details.
- Issues – bug reports & feature requests on GitHub
- Discussions – ask „how do I…?“ in the GitHub Discussions tab
- LiveKit Slack – #voice-ai-agents channel for real-time help
- LiveKit – realtime audio/video infra
- LiveKit Agents – official agents SDK
- Mem0 – vector-based memory store
- LlamaIndex – data framework for RAG
- Pinecone – vector database powering the RAG example
The roadmap for Vaani includes several significant capabilities that are already under active development:
-
Custom tools via OpenAPI / JSON Schema
Allow tool authors to describe functions with a full API spec. The spec is surfaced to the LLM so it can reason about parameters, response shapes & error codes automatically—no manual prompt engineering required. -
Remote configuration server
Allow the worker to fetch the agent config from a remote server usingagent_idinstead of searching in local files. -
MCP integration
Add support for MCP which allows users to connect their tools following Model context protocol. -
Context enrichment from memory
Whenmemory_config.enabledistrue, the start-up pipeline will query Mem0 for the caller's recent memories and automatically prepend them to the agent's initial context so the LLM starts the conversation fully informed.
Feel free to open discussions or issues if you'd like to shape any of these features.