Skip to main content

Overview

ElevenLabs provides high-quality text-to-speech synthesis with two service implementations:
  • ElevenLabsTTSService (WebSocket) — Real-time streaming with word-level timestamps, audio context management, and interruption handling. Recommended for interactive applications.
  • ElevenLabsHttpTTSService (HTTP) — Simpler batch-style synthesis. Suitable for non-interactive use cases or when WebSocket connections are not possible.

ElevenLabs TTS API Reference

Complete API reference for all parameters and methods

Example Implementation

Complete example with WebSocket streaming

ElevenLabs Documentation

Official ElevenLabs TTS API documentation

Voice Library

Browse and clone voices from the community

Installation

pip install "pipecat-ai[elevenlabs]"

Prerequisites

  1. ElevenLabs Account: Sign up at ElevenLabs
  2. API Key: Generate an API key from your account dashboard
  3. Voice Selection: Choose voice IDs from the voice library
Set the following environment variable:
export ELEVENLABS_API_KEY=your_api_key

Configuration

ElevenLabsTTSService

api_key
str
required
ElevenLabs API key.
voice_id
str
required
deprecated
Voice ID from the voice library. Deprecated in v0.0.105. Use settings=ElevenLabsTTSService.Settings(voice=...) instead.
model
str
default:"eleven_turbo_v2_5"
deprecated
ElevenLabs model ID. Use a multilingual model variant (e.g. eleven_multilingual_v2) if you need non-English language support. Deprecated in v0.0.105. Use settings=ElevenLabsTTSService.Settings(model=...) instead.
url
str
default:"wss://api.elevenlabs.io"
WebSocket endpoint URL. Override for custom or proxied deployments.
sample_rate
int
default:"None"
Output audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
text_aggregation_mode
TextAggregationMode
default:"TextAggregationMode.SENTENCE"
Controls how incoming text is aggregated before synthesis. SENTENCE (default) buffers text until sentence boundaries, producing more natural speech. TOKEN streams tokens directly for lower latency. Import from pipecat.services.tts_service.
aggregate_sentences
bool
default:"None"
deprecated
Deprecated in v0.0.104. Use text_aggregation_mode instead.
params
InputParams
default:"None"
deprecated
Deprecated in v0.0.105. Use settings=ElevenLabsTTSService.Settings(...) instead.
settings
ElevenLabsTTSService.Settings
default:"None"
Runtime-configurable settings. See Settings below.

ElevenLabsHttpTTSService

The HTTP service accepts the same parameters as the WebSocket service, with these differences:
aiohttp_session
aiohttp.ClientSession
required
An aiohttp session for HTTP requests. You must create and manage this yourself.
base_url
str
default:"https://api.elevenlabs.io"
HTTP API base URL (instead of url for WebSocket).
The HTTP service uses ElevenLabsHttpTTSSettings which also includes:
optimize_streaming_latency
int
default:"None"
Latency optimization level (0–4). Higher values reduce latency at the cost of quality.

Settings

Runtime-configurable settings passed via the settings constructor argument using ElevenLabsTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneElevenLabs model identifier. (Inherited from base settings.)
voicestrNoneVoice identifier. (Inherited from base settings.)
languageLanguage | strNoneLanguage code. Only effective with multilingual models. (Inherited from base settings.)
stabilityfloatNOT_GIVENVoice consistency (0.0–1.0). Lower values are more expressive, higher values are more consistent.
similarity_boostfloatNOT_GIVENVoice clarity and similarity to the original (0.0–1.0).
stylefloatNOT_GIVENStyle exaggeration (0.0–1.0). Higher values amplify the voice’s style.
use_speaker_boostboolNOT_GIVENEnhance clarity and target speaker similarity.
speedfloatNOT_GIVENSpeech rate. WebSocket: 0.7–1.2. HTTP: 0.25–4.0.
apply_text_normalizationLiteralNOT_GIVENText normalization: "auto", "on", or "off".
NOT_GIVEN values use the ElevenLabs API defaults. See ElevenLabs voice settings for details on how these parameters interact.

Usage

Basic Setup

from pipecat.services.elevenlabs import ElevenLabsTTSService

tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    settings=ElevenLabsTTSService.Settings(
        voice="21m00Tcm4TlvDq8ikWAM",  # Rachel
    ),
)

With Voice Customization

tts = ElevenLabsTTSService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    settings=ElevenLabsTTSService.Settings(
        voice="21m00Tcm4TlvDq8ikWAM",
        model="eleven_multilingual_v2",
        language=Language.ES,
        stability=0.7,
        similarity_boost=0.8,
        speed=1.1,
    ),
)

Updating Settings at Runtime

Voice settings can be changed mid-conversation using TTSUpdateSettingsFrame:
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.elevenlabs.tts import ElevenLabsTTSSettings

await task.queue_frame(
    TTSUpdateSettingsFrame(
        delta=ElevenLabsTTSSettings(
            stability=0.3,
            speed=1.1,
        )
    )
)

HTTP Service

import aiohttp
from pipecat.services.elevenlabs import ElevenLabsHttpTTSService

async with aiohttp.ClientSession() as session:
    tts = ElevenLabsHttpTTSService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        settings=ElevenLabsHttpTTSService.Settings(
            voice="21m00Tcm4TlvDq8ikWAM",
        ),
        aiohttp_session=session,
    )
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.

Notes

  • Multilingual models required for language: Setting language with a non-multilingual model (e.g. eleven_turbo_v2_5) has no effect. Use eleven_multilingual_v2 or similar.
  • WebSocket vs HTTP: The WebSocket service supports word-level timestamps and interruption handling, making it significantly better for interactive conversations. The HTTP service is simpler but lacks these features.
  • Text aggregation: Sentence aggregation is enabled by default (text_aggregation_mode=TextAggregationMode.SENTENCE). Buffering until sentence boundaries produces more natural speech. Set text_aggregation_mode=TextAggregationMode.TOKEN to stream tokens directly for lower latency, but you must also set auto_mode=False in settings when using TOKEN mode.

Event Handlers

ElevenLabs TTS supports the standard service connection events:
EventDescription
on_connectedConnected to ElevenLabs WebSocket
on_disconnectedDisconnected from ElevenLabs WebSocket
on_connection_errorWebSocket connection error occurred
@tts.event_handler("on_connected")
async def on_connected(service):
    print("Connected to ElevenLabs")