ElevenLabs

Overview

ElevenLabs provides high-quality text-to-speech synthesis with two service implementations:

ElevenLabsTTSService (WebSocket) — Real-time streaming with word-level timestamps, audio context management, and interruption handling. Recommended for interactive applications.
ElevenLabsHttpTTSService (HTTP) — Simpler batch-style synthesis. Suitable for non-interactive use cases or when WebSocket connections are not possible.

ElevenLabs TTS API Reference

Complete API reference for all parameters and methods

Example Implementation

Complete example with WebSocket streaming

ElevenLabs Documentation

Official ElevenLabs TTS API documentation

Voice Library

Browse and clone voices from the community

Installation

pip install "pipecat-ai[elevenlabs]"

Prerequisites

ElevenLabs Account: Sign up at ElevenLabs
API Key: Generate an API key from your account dashboard
Voice Selection: Choose voice IDs from the voice library

Set the following environment variable:

export ELEVENLABS_API_KEY=your_api_key

Configuration

ElevenLabsTTSService

api_key

str

required

ElevenLabs API key.

voice_id

str

required

deprecated

Voice ID from the voice library. Deprecated in v0.0.105. Use settings=ElevenLabsTTSService.Settings(voice=...) instead.

model

str

default:"eleven_turbo_v2_5"

deprecated

ElevenLabs model ID. Use a multilingual model variant (e.g. eleven_multilingual_v2) if you need non-English language support. Deprecated in v0.0.105. Use settings=ElevenLabsTTSService.Settings(model=...) instead.

url

str

default:"wss://api.elevenlabs.io"

WebSocket endpoint URL. Override for custom or proxied deployments.

sample_rate

int

default:"None"

Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.

text_aggregation_mode

TextAggregationMode

default:"TextAggregationMode.SENTENCE"

Controls how incoming text is aggregated before synthesis. SENTENCE (default) buffers text until sentence boundaries, producing more natural speech. TOKEN streams tokens directly for lower latency. Import from pipecat.services.tts_service.

aggregate_sentences

bool

default:"None"

deprecated

Deprecated in v0.0.104. Use text_aggregation_mode instead.

params

InputParams

default:"None"

deprecated

Deprecated in v0.0.105. Use settings=ElevenLabsTTSService.Settings(...) instead.

settings

ElevenLabsTTSService.Settings

default:"None"

Runtime-configurable settings. See Settings below.

ElevenLabsHttpTTSService

The HTTP service accepts the same parameters as the WebSocket service, with these differences:

aiohttp_session

aiohttp.ClientSession

required

An aiohttp session for HTTP requests. You must create and manage this yourself.

base_url

str

default:"https://api.elevenlabs.io"

HTTP API base URL (instead of url for WebSocket).

The HTTP service uses ElevenLabsHttpTTSSettings which also includes:

optimize_streaming_latency

int

default:"None"

Latency optimization level (0–4). Higher values reduce latency at the cost of quality.

Settings

Runtime-configurable settings passed via the settings constructor argument using ElevenLabsTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.

Parameter	Type	Default	Description
`model`	`str`	`None`	ElevenLabs model identifier. (Inherited from base settings.)
`voice`	`str`	`None`	Voice identifier. (Inherited from base settings.)
`language`	`Language \| str`	`None`	Language code. Only effective with multilingual models. (Inherited from base settings.)
`stability`	`float`	`NOT_GIVEN`	Voice consistency (0.0–1.0). Lower values are more expressive, higher values are more consistent.
`similarity_boost`	`float`	`NOT_GIVEN`	Voice clarity and similarity to the original (0.0–1.0).
`style`	`float`	`NOT_GIVEN`	Style exaggeration (0.0–1.0). Higher values amplify the voice’s style.
`use_speaker_boost`	`bool`	`NOT_GIVEN`	Enhance clarity and target speaker similarity.
`speed`	`float`	`NOT_GIVEN`	Speech rate. WebSocket: 0.7–1.2. HTTP: 0.25–4.0.
`apply_text_normalization`	`Literal`	`NOT_GIVEN`	Text normalization: `"auto"`, `"on"`, or `"off"`.

NOT_GIVEN values use the ElevenLabs API defaults. See ElevenLabs voice settings for details on how these parameters interact.

Usage

Basic Setup

from pipecat.services.elevenlabs import ElevenLabsTTSService

tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    settings=ElevenLabsTTSService.Settings(
        voice="21m00Tcm4TlvDq8ikWAM",  # Rachel
    ),
)

With Voice Customization

tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    settings=ElevenLabsTTSService.Settings(
        voice="21m00Tcm4TlvDq8ikWAM",
        model="eleven_multilingual_v2",
        language=Language.ES,
        stability=0.7,
        similarity_boost=0.8,
        speed=1.1,
    ),
)

Updating Settings at Runtime

Voice settings can be changed mid-conversation using TTSUpdateSettingsFrame:

from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.elevenlabs.tts import ElevenLabsTTSSettings

await task.queue_frame(
    TTSUpdateSettingsFrame(
        delta=ElevenLabsTTSSettings(
            stability=0.3,
            speed=1.1,
        )
    )
)

HTTP Service

import aiohttp
from pipecat.services.elevenlabs import ElevenLabsHttpTTSService

async with aiohttp.ClientSession() as session:
    tts = ElevenLabsHttpTTSService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        settings=ElevenLabsHttpTTSService.Settings(
            voice="21m00Tcm4TlvDq8ikWAM",
        ),
        aiohttp_session=session,
    )

The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

Multilingual models required for language: Setting language with a non-multilingual model (e.g. eleven_turbo_v2_5) has no effect. Use eleven_multilingual_v2 or similar.
WebSocket vs HTTP: The WebSocket service supports word-level timestamps and interruption handling, making it significantly better for interactive conversations. The HTTP service is simpler but lacks these features.
Text aggregation: Sentence aggregation is enabled by default (text_aggregation_mode=TextAggregationMode.SENTENCE). Buffering until sentence boundaries produces more natural speech. Set text_aggregation_mode=TextAggregationMode.TOKEN to stream tokens directly for lower latency, but you must also set auto_mode=False in settings when using TOKEN mode.

Event Handlers

ElevenLabs TTS supports the standard service connection events:

Event	Description
`on_connected`	Connected to ElevenLabs WebSocket
`on_disconnected`	Disconnected from ElevenLabs WebSocket
`on_connection_error`	WebSocket connection error occurred

@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to ElevenLabs")

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

ElevenLabs TTS API Reference

Example Implementation

ElevenLabs Documentation

Voice Library

Installation

Prerequisites

Configuration

ElevenLabsTTSService

ElevenLabsHttpTTSService

Settings

Usage

Basic Setup

With Voice Customization

Updating Settings at Runtime

HTTP Service

Notes

Event Handlers

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

ElevenLabs TTS API Reference

Example Implementation

ElevenLabs Documentation

Voice Library

​Installation

​Prerequisites

​Configuration

​ElevenLabsTTSService

​ElevenLabsHttpTTSService

​Settings

​Usage

​Basic Setup

​With Voice Customization

​Updating Settings at Runtime

​HTTP Service

​Notes

​Event Handlers

Overview

Installation

Prerequisites

Configuration

ElevenLabsTTSService

ElevenLabsHttpTTSService

Settings

Usage

Basic Setup

With Voice Customization

Updating Settings at Runtime

HTTP Service

Notes

Event Handlers