Skip to content

On-device AI phone agent for macOS. Handles calls using local LLM, speech recognition, and TTS — 100% private, no cloud required.

Notifications You must be signed in to change notification settings

MuhsinunC/phone-agent

Repository files navigation

Phone Agent (macOS Voice Agent)

On-device macOS voice agent that captures FaceTime/Continuity call audio, transcribes with Whisper, generates replies with Phi-3 Mini via llama.cpp, and responds with a cloned user voice.

Features

  • Privacy-first: All AI inference runs on-device - no cloud APIs
  • Zero-config models: AI models download automatically on first launch
  • Auto-managed LLM server: The app starts/stops llama.cpp automatically
  • Guided setup: In-app wizard walks you through all setup steps
  • Voice cloning: Train the agent to speak in your voice

Quick Start

1. Build Dependencies (One-Time)

cd /path/to/phone-agent
bash Scripts/build-frameworks.sh

2. Build & Run

Open PhoneAgentApp/PhoneAgentApp.xcodeproj in Xcode, build, and run.

3. Follow the Setup Wizard

The app will guide you through:

  1. Microphone access - For voice recording
  2. Screen recording - For capturing FaceTime audio
  3. BlackHole setup - For routing agent voice to calls
  4. Voice training (optional) - To clone your voice

The setup wizard includes:

  • Auto-detection of BlackHole installation
  • One-click copy of install command
  • Direct links to System Settings and FaceTime

How It Works

FaceTime Call
      |
      v
[Screen Capture] --> [ASR/Whisper] --> [LLM/Phi-3] --> [TTS] --> [BlackHole] --> FaceTime Mic
      |                     |                |                        |
   Caller audio       Transcription    AI Response              Agent speaks

Automatic Features

Feature What It Does
Model Download Phi-3 model auto-downloads to ~/Library/Application Support/PhoneAgent/Models/
LLM Server llama.cpp server auto-starts, health monitored, status shown in UI
BlackHole Detection App detects if BlackHole is installed and guides you through setup
Call Detection Automatically activates when FaceTime is running

Why BlackHole?

macOS doesn't provide an API to inject audio into another app's microphone. BlackHole creates a virtual audio device that acts as a bridge:

Your App (TTS Output) --> BlackHole --> FaceTime (Microphone Input)

The setup wizard makes this easy:

  1. Detects if BlackHole is already installed
  2. Provides the install command with one-click copy
  3. Links directly to FaceTime settings for configuration

Project Structure

PhoneAgent/           # Swift package (main library)
  App/                # App state, services, permissions
  Audio/              # Audio capture, injection, VAD, BlackHole detection
  AI/                 # LLM, ASR, TTS, orchestration
  UI/                 # SwiftUI views
  Models/             # Data models
PhoneAgentApp/        # Xcode project wrapper
Scripts/              # Build helpers
Frameworks/           # Built dependencies (llama.cpp, sherpa-onnx)

Troubleshooting

Issue Solution
BlackHole not detected Run brew install blackhole-2ch, then click "Check Installation"
Models not downloading Check internet; retry from the download screen
LLM server won't start Run Scripts/build-frameworks.sh again
No audio to caller In FaceTime > Settings > Audio, set microphone to "BlackHole 2ch"

Requirements

  • macOS 13.0+
  • Xcode 15+
  • ~3 GB disk space for models
  • Homebrew (for BlackHole installation)

About

On-device AI phone agent for macOS. Handles calls using local LLM, speech recognition, and TTS — 100% private, no cloud required.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages