Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,8 @@
"pages": [
"server/services/s2s/aws",
"server/services/s2s/gemini",
"server/services/s2s/openai"
"server/services/s2s/openai",
"server/services/s2s/pinch"
]
},
{
Expand Down
138 changes: 138 additions & 0 deletions server/services/s2s/pinch.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
title: "Pinch"
description: "Real-time translation service implementation using Pinch's speech-to-speech API"
---

## Overview

`PinchAudioService` provides real-time speech translation with synchronized audio output and transcription capabilities. The service translates spoken audio from one language to another while maintaining natural conversation flow through streaming audio processing.

The service provides:
- **Real-time Translation**: Stream audio input and receive translated audio output with minimal latency
- **Dual Transcription**: Both source language transcription and translated text output
- **Voice Synthesis**: Natural-sounding translated speech with customizable voice parameters
- **Streaming Architecture**: Optimized for low-latency conversational applications

## Installation

To use `PinchAudioService`, install the required dependencies:

```bash
pip install "pipecat-ai[pinch]"
```

You'll also need to set up your Pinch API token as an environment variable: `PINCH_API_TOKEN`.

<Tip>
Get your API token by creating an account at [Pinch](https://www.startpinch.com/).
</Tip>

## Frames

### Input

<ParamField path="InputAudioRawFrame" type="Frame">
Raw PCM audio data for speech input (16-bit, 16kHz, mono)
</ParamField>

### Output

<ParamField path="TranscriptionFrame" type="Frame">
Final transcription of the source language speech
</ParamField>

<ParamField path="InterimTranscriptionFrame" type="Frame">
Real-time partial transcription updates during speech
</ParamField>

<ParamField path="LLMTextFrame" type="Frame">
Translated text output in the target language
</ParamField>

<ParamField path="TTSTextFrame" type="Frame">
Text being synthesized to speech in the target language
</ParamField>

<ParamField path="SpeechOutputAudioRawFrame" type="Frame">
Translated audio stream chunks (16-bit PCM)
</ParamField>

## Configuration

### Constructor Parameters

<ParamField path="api_token" type="str" required>
Pinch API authentication token
</ParamField>

<ParamField path="session" type="aiohttp.ClientSession" required>
HTTP client session for WebSocket connections and API requests
</ParamField>

<ParamField path="session_request" type="PinchSessionRequest">
Session configuration object. Defaults to English → Spanish translation with female voice
</ParamField>

### Session Configuration

The `PinchSessionRequest` object configures the translation session:

<ParamField path="source_language" type="str">
Input language code (e.g., "en" for English). See supported languages below
</ParamField>

<ParamField path="target_language" type="str">
Output language code (e.g., "es" for Spanish). See supported languages below
</ParamField>

<ParamField path="voice_type" type="str">
Voice characteristic for synthesized speech. Options: "female", "male"
</ParamField>

<ParamField path="enable_audio_output" type="bool">
Whether to generate translated audio output. Default: `True`
</ParamField>

<ParamField path="enable_transcription" type="bool">
Whether to output transcription frames. Default: `True`
</ParamField>

<ParamField path="sample_rate" type="int">
Audio sample rate in Hz. Default: `16000`
</ParamField>

## Language Support

Pinch supports real-time translation between a growing number of language pairs. We are constantly adding new languages and improving translation quality.
For a complete list of supported languages and available translation pairs, please visit the [Pinch documentation](https://www.startpinch.com/).

## Usage Example

```python
from pipecat.transports.pinch.api import PinchSessionRequest
from pipecat.services.pinch import PinchAudioService

pinch_api_key = os.getenv("PINCH_API_KEY")

# Configure session
session_request = PinchSessionRequest(
source_language="en",
target_language="es",
voice_type="female",
enable_audio_output=True
)

# Create Pinch audio streaming service
pinch_service = PinchAudioService(
api_token=pinch_api_key,
session=session,
session_request=session_request
)

# Create pipeline
pipeline = Pipeline([
transport.input(), # Audio input
pinch_service, # Translation service
transport.output(), # Audio output
])
```
1 change: 1 addition & 0 deletions server/services/supported-services.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ Speech-to-Speech services are multi-modal LLM services that take in audio, video
| [AWS Nova Sonic](/server/services/s2s/aws) | `pip install "pipecat-ai[aws-nova-sonic]"` |
| [Gemini Multimodal Live](/server/services/s2s/gemini) | `pip install "pipecat-ai[google]"` |
| [OpenAI Realtime](/server/services/s2s/openai) | `pip install "pipecat-ai[openai]"` |
| [Pinch](/server/services/s2s/pinch) | `pip install "pipecat-ai[pinch]"` |

## Image Generation

Expand Down