Skip to main content

Overview

ElevenLabs provides two STT service implementations:
  • ElevenLabsSTTService (HTTP) — File-based transcription using ElevenLabs’ Speech-to-Text API with segmented audio processing. Uploads audio files and receives transcription results directly.
  • ElevenLabsRealtimeSTTService (WebSocket) — Real-time streaming transcription with ultra-low latency, supporting both partial (interim) and committed (final) transcripts with manual or VAD-based commit strategies.

ElevenLabs STT API Reference

Pipecat’s API methods for ElevenLabs STT integration

Example Implementation

Complete example with ElevenLabs STT and TTS

ElevenLabs Documentation

Official ElevenLabs STT API documentation

ElevenLabs Platform

Access API keys and speech-to-text models

Installation

To use ElevenLabs STT services, install the required dependencies:
pip install "pipecat-ai[elevenlabs]"

Prerequisites

ElevenLabs Account Setup

Before using ElevenLabs STT services, you need:
  1. ElevenLabs Account: Sign up at ElevenLabs Platform
  2. API Key: Generate an API key from your account dashboard
  3. Model Access: Ensure access to the Scribe v2 transcription model (default: scribe_v2)

Required Environment Variables

  • ELEVENLABS_API_KEY: Your ElevenLabs API key for authentication

ElevenLabsSTTService

api_key
str
required
ElevenLabs API key for authentication.
aiohttp_session
aiohttp.ClientSession
required
An aiohttp session for HTTP requests. You must create and manage this yourself.
base_url
str
default:"https://api.elevenlabs.io"
Base URL for the ElevenLabs API.
model
str
default:"scribe_v2"
deprecated
Model ID for transcription. Deprecated in v0.0.105. Use settings=ElevenLabsSTTService.Settings(...) instead.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
settings
ElevenLabsSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See Settings below.
params
ElevenLabsSTTService.InputParams
default:"None"
deprecated
Configuration parameters for the STT service. Deprecated in v0.0.105. Use settings=ElevenLabsSTTService.Settings(...) instead.
ttfs_p99_latency
float
default:"ELEVENLABS_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Settings

Runtime-configurable settings passed via the settings constructor argument using ElevenLabsSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel ID for transcription. (Inherited from base STT settings.)
languageLanguage | strNoneTarget language for transcription. (Inherited from base STT settings.)
tag_audio_eventsboolTrueInclude audio events like (laughter), (coughing) in transcription.

Usage

import aiohttp
from pipecat.services.elevenlabs.stt import ElevenLabsSTTService

async with aiohttp.ClientSession() as session:
    stt = ElevenLabsSTTService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        aiohttp_session=session,
    )

With Language and Audio Events

import aiohttp
from pipecat.services.elevenlabs.stt import ElevenLabsSTTService
from pipecat.transcriptions.language import Language

async with aiohttp.ClientSession() as session:
    stt = ElevenLabsSTTService(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
        aiohttp_session=session,
        settings=ElevenLabsSTTService.Settings(
            language=Language.ES,
            tag_audio_events=False,
        ),
    )

Notes

  • The HTTP service uploads complete audio segments and is best for VAD-segmented transcription.
  • Does not have connection events since it uses per-request HTTP calls.

ElevenLabsRealtimeSTTService

api_key
str
required
ElevenLabs API key for authentication.
base_url
str
default:"api.elevenlabs.io"
Base URL for the ElevenLabs WebSocket API.
model
str
default:"scribe_v2_realtime"
deprecated
Model ID for real-time transcription. Deprecated in v0.0.105. Use settings=ElevenLabsRealtimeSTTService.Settings(...) instead.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
settings
ElevenLabsRealtimeSTTService.Settings
default:"None"
Runtime-configurable settings for the Realtime STT service. See Settings below.
commit_strategy
CommitStrategy
default:"CommitStrategy.MANUAL"
How to segment speech. CommitStrategy.MANUAL uses Pipecat’s VAD to control when transcript segments are committed. CommitStrategy.VAD uses ElevenLabs’ built-in VAD for segment boundaries.
include_timestamps
bool
default:"False"
Whether to include word-level timestamps in transcripts.
enable_logging
bool
default:"False"
Whether to enable logging on ElevenLabs’ side.
include_language_detection
bool
default:"False"
Whether to include language detection in transcripts.
params
ElevenLabsRealtimeSTTService.InputParams
default:"None"
deprecated
Configuration parameters for the STT service. Deprecated in v0.0.105. Use settings=ElevenLabsRealtimeSTTService.Settings(...) instead.
ttfs_p99_latency
float
default:"ELEVENLABS_REALTIME_TTFS_P99"
P99 latency from speech end to final transcript in seconds. Override for your deployment.

Settings

Runtime-configurable settings passed via the settings constructor argument using ElevenLabsRealtimeSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneModel ID for transcription. (Inherited from base STT settings.)
languageLanguage | strNoneLanguage for speech recognition. (Inherited from base STT settings.)
vad_silence_threshold_secsfloatNoneSeconds of silence before VAD commits (0.3-3.0). Only used with VAD commit strategy.
vad_thresholdfloatNoneVAD sensitivity (0.1-0.9, lower is more sensitive). Only used with VAD commit strategy.
min_speech_duration_msintNoneMinimum speech duration for VAD (50-2000ms). Only used with VAD commit strategy.
min_silence_duration_msintNoneMinimum silence duration for VAD (50-2000ms). Only used with VAD commit strategy.

Usage

from pipecat.services.elevenlabs.stt import ElevenLabsRealtimeSTTService

stt = ElevenLabsRealtimeSTTService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
)

With Timestamps and Custom Commit Strategy

from pipecat.services.elevenlabs.stt import ElevenLabsRealtimeSTTService, CommitStrategy

stt = ElevenLabsRealtimeSTTService(
    api_key=os.getenv("ELEVENLABS_API_KEY"),
    language_code="eng",
    commit_strategy=CommitStrategy.VAD,
    include_timestamps=True,
    settings=ElevenLabsRealtimeSTTService.Settings(
        vad_silence_threshold_secs=1.0,
    ),
)

Notes

  • Commit strategies: Defaults to manual commit strategy, where Pipecat’s VAD controls when transcription segments are committed. Set commit_strategy=CommitStrategy.VAD to let ElevenLabs handle segment boundaries. When using MANUAL commit strategy, transcription frames are marked as finalized (TranscriptionFrame.finalized=True).
  • Keepalive: Sends silent audio chunks as keepalive to prevent idle disconnections (keepalive interval: 5s, timeout: 10s).
  • Auto-reconnect: Automatically reconnects if the WebSocket connection is closed when new audio arrives.

Event Handlers

Supports the standard service connection events:
EventDescription
on_connectedConnected to ElevenLabs Realtime STT WebSocket
on_disconnectedDisconnected from ElevenLabs Realtime STT WebSocket
@stt.event_handler("on_connected")
async def on_connected(service):
    print("Connected to ElevenLabs Realtime STT")
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.