Feature: Add SenseVoice for lower-latency real-time transcription

Hi! Impressive AI meeting copilot with echo cancellation.

For real-time meeting transcription, **SenseVoice** could significantly reduce ASR latency:

## Key advantages for a meeting copilot

1. **Non-autoregressive** — single forward pass gives full transcription (no sequential token generation)
2. **~50ms latency on GPU** — ideal for real-time copilot scenarios
3. **234M params / ~1GB VRAM** — leaves room for your LLM on the same GPU
4. **Built-in features**: VAD, speaker diarization (cam++), emotion detection
5. **OpenAI-compatible API** — if you already use Whisper API, it's a drop-in

## Quick start

```python
from funasr import AutoModel

model = AutoModel(
    model="iic/SenseVoiceSmall",
    vad_model="fsmn-vad",
    spk_model="cam++",
)
result = model.generate(input=audio_chunk)
```

Or start a server:
```bash
pip install funasr
funasr-server --device cuda  # localhost:8000, OpenAI-compatible
```

## Links
- SenseVoice: https://github.com/FunAudioLLM/SenseVoice (8.3K stars)
- FunASR: https://github.com/modelscope/FunASR (16.7K stars)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add SenseVoice for lower-latency real-time transcription #1

Key advantages for a meeting copilot

Quick start

Links

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Add SenseVoice for lower-latency real-time transcription #1

Description

Key advantages for a meeting copilot

Quick start

Links

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions