Skip to content

Feature: Add SenseVoice for lower-latency real-time transcription #1

@LauraGPT

Description

@LauraGPT

Hi! Impressive AI meeting copilot with echo cancellation.

For real-time meeting transcription, SenseVoice could significantly reduce ASR latency:

Key advantages for a meeting copilot

  1. Non-autoregressive — single forward pass gives full transcription (no sequential token generation)
  2. ~50ms latency on GPU — ideal for real-time copilot scenarios
  3. 234M params / ~1GB VRAM — leaves room for your LLM on the same GPU
  4. Built-in features: VAD, speaker diarization (cam++), emotion detection
  5. OpenAI-compatible API — if you already use Whisper API, it's a drop-in

Quick start

from funasr import AutoModel

model = AutoModel(
    model="iic/SenseVoiceSmall",
    vad_model="fsmn-vad",
    spk_model="cam++",
)
result = model.generate(input=audio_chunk)

Or start a server:

pip install funasr
funasr-server --device cuda  # localhost:8000, OpenAI-compatible

Links

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions