Skip to main content

Overview

NVIDIA Riva provides two STT service implementations:
  • NvidiaSTTService — Real-time streaming transcription using Parakeet models with interim results and continuous audio processing.
  • NvidiaSegmentedSTTService — Segmented transcription using Canary models with advanced language support, word boosting, and enterprise-grade accuracy.

NVIDIA Riva STT API Reference

Pipecat’s API methods for NVIDIA Riva STT integration

Example Implementation

Complete example with NVIDIA services integration

NVIDIA Riva Documentation

Official NVIDIA Riva ASR documentation

NVIDIA Developer Portal

Access API keys and Riva services

Installation

To use NVIDIA Riva services, install the required dependency:
pip install "pipecat-ai[nvidia]"

Prerequisites

NVIDIA Riva Setup

Before using NVIDIA Riva STT services, you need:
  1. NVIDIA Developer Account: Sign up at NVIDIA Developer Portal
  2. API Key: Generate an NVIDIA API key for Riva services
  3. Model Selection: Choose between Parakeet (streaming) and Canary (segmented) models

Required Environment Variables

  • NVIDIA_API_KEY: Your NVIDIA API key for authentication

NvidiaSTTService

Real-time streaming transcription using NVIDIA Riva’s Parakeet models.
api_key
str
required
NVIDIA API key for authentication.
server
str
default:"grpc.nvcf.nvidia.com:443"
NVIDIA Riva server address.
model_function_map
Mapping[str, str]
Mapping containing function_id and model_name for the ASR model.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
NvidiaSTTService.InputParams
default:"None"
deprecated
Additional configuration parameters. Deprecated in v0.0.105. Use settings=NvidiaSTTService.Settings(...) instead.
settings
NvidiaSTTService.Settings
default:"None"
Runtime-configurable settings. See Settings below.
use_ssl
bool
default:"True"
Whether to use SSL for the gRPC connection.
ttfs_p99_latency
float
default:"1.0"
P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

Settings

Runtime-configurable settings passed via the settings constructor argument using NvidiaSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneSTT model identifier. (Inherited from base STT settings.)
languageLanguage | strLanguage.EN_USTarget language for transcription. (Inherited from base STT settings.)

Usage

from pipecat.services.nvidia.stt import NvidiaSTTService

stt = NvidiaSTTService(
    api_key=os.getenv("NVIDIA_API_KEY"),
)

Notes

  • Model cannot be changed after initialization: Use the model_function_map parameter in the constructor to specify the model and function ID.
  • Streaming: Provides real-time interim and final results through continuous audio streaming.

NvidiaSegmentedSTTService

Batch/segmented transcription using NVIDIA Riva’s Canary models. Processes complete audio segments after VAD detects speech boundaries.
api_key
str
required
NVIDIA API key for authentication.
server
str
default:"grpc.nvcf.nvidia.com:443"
NVIDIA Riva server address.
model_function_map
Mapping[str, str]
Mapping containing function_id and model_name for the ASR model.
sample_rate
int
default:"None"
Audio sample rate in Hz. When None, uses the pipeline’s configured sample rate.
params
NvidiaSegmentedSTTService.InputParams
default:"None"
deprecated
Additional configuration parameters. Deprecated in v0.0.105. Use settings=NvidiaSegmentedSTTService.Settings(...) instead.
settings
NvidiaSegmentedSTTService.Settings
default:"None"
Runtime-configurable settings. See Settings below.
use_ssl
bool
default:"True"
Whether to use SSL for the gRPC connection.
ttfs_p99_latency
float
default:"1.0"
P99 latency from speech end to final transcript in seconds. Override for your deployment. See stt-benchmark.

Settings

Runtime-configurable settings passed via the settings constructor argument using NvidiaSegmentedSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstrNoneSTT model identifier. (Inherited from base STT settings.)
languageLanguage | strLanguage.EN_USTarget language for transcription. (Inherited from base STT settings.)
profanity_filterboolFalseWhether to filter profanity from results.
automatic_punctuationboolTrueWhether to add automatic punctuation.
verbatim_transcriptsboolFalseWhether to return verbatim transcripts.
boosted_lm_wordslist[str]NoneList of words to boost in the language model.
boosted_lm_scorefloat4.0Score boost for specified words.

Usage

from pipecat.services.nvidia.stt import NvidiaSegmentedSTTService
from pipecat.transcriptions.language import Language

stt = NvidiaSegmentedSTTService(
    api_key=os.getenv("NVIDIA_API_KEY"),
    settings=NvidiaSegmentedSTTService.Settings(
        language=Language.ES,
        automatic_punctuation=True,
        boosted_lm_words=["Pipecat", "NVIDIA"],
        boosted_lm_score=6.0,
    ),
)

Notes

  • Model cannot be changed after initialization: Use the model_function_map parameter in the constructor to specify the model and function ID.
  • Segmented processing: Processes complete audio segments for higher accuracy compared to streaming.
  • Language support: Supports Arabic, English (US/GB), French, German, Hindi, Italian, Japanese, Korean, Portuguese (BR), Russian, and Spanish (ES/US).
  • Word boosting: Use boosted_lm_words and boosted_lm_score to improve recognition of domain-specific terms.
The InputParams / params= pattern is deprecated as of v0.0.105. Use Settings / settings= instead. See the Service Settings guide for migration details.