In long-running voice AI conversations, context grows with every exchange. This increases token usage, raises costs, and can eventually hit context window limits. Pipecat includes built-in context summarization that automatically compresses older conversation history while preserving recent messages and important context.
Context summarization automatically triggers when either condition is met:
Token limit reached: Context size exceeds max_context_tokens (estimated using ~4 characters per token)
Message count reached: Number of new messages exceeds max_unsummarized_messages
You can disable either threshold by setting it to None, as long as at least one remains active.When triggered, the system:
Sends a LLMContextSummaryRequestFrame to the LLM service
The LLM generates a concise summary of older messages
Context is reconstructed as: [system_message] + [summary] + [recent_messages]
Incomplete function call sequences and recent messages are preserved
Context summarization is asynchronous and happens in the background without
blocking the pipeline. The system uses request IDs to match summary requests
with results and handles interruptions gracefully.
System messages: If a system message exists in the context, the first one is always kept. When using system_instruction in LLM Settings instead, the system prompt is not part of the context messages and is automatically prepended by the service on each request, so there is nothing to preserve in the context.
Recent messages: The last N messages stay uncompressed (configured by min_messages_after_summary)
Function call sequences: Incomplete function call/result pairs are not split during summarization
You can override the default summarization prompt to control how the LLM generates summaries:
Copy
custom_prompt = """Summarize this conversation concisely.Focus on: key decisions, user preferences, and action items.Keep the summary under {target_tokens} tokens."""config = LLMAutoContextSummarizationConfig( summary_config=LLMContextSummaryConfig( summarization_prompt=custom_prompt, ),)
When no custom prompt is provided, Pipecat uses a built-in prompt that instructs the LLM to create a concise summary preserving key information, user preferences, and conversation flow.
By default, summarization uses the same LLM service that handles conversation. You can route summarization to a separate, cheaper model by setting the llm field:
Copy
from pipecat.services.google import GoogleLLMService# Use a fast/cheap model for summarizationsummarization_llm = GoogleLLMService( api_key=os.getenv("GOOGLE_API_KEY"), model="gemini-2.5-flash",)config = LLMAutoContextSummarizationConfig( summary_config=LLMContextSummaryConfig( llm=summarization_llm, ),)
When a dedicated LLM is configured, summarization requests bypass the pipeline entirely and call the dedicated service directly, so the primary conversation LLM is never interrupted.
In addition to automatic summarization, you can trigger context summarization on demand by pushing an LLMSummarizeContextFrame into the pipeline. This is useful when you want to give users explicit control over when summarization happens — for example, via a function call tool.
Copy
from pipecat.frames.frames import LLMSummarizeContextFramefrom pipecat.services.llm_service import FunctionCallParamsasync def summarize_conversation(params: FunctionCallParams): """Trigger manual context summarization via a pipeline frame.""" await params.result_callback({"status": "summarization_requested"}) await params.llm.queue_frame(LLMSummarizeContextFrame())
Register this as a function call tool so the LLM can invoke it when the user asks to summarize:
Copy
from pipecat.adapters.schemas.function_schema import FunctionSchemafrom pipecat.adapters.schemas.tools_schema import ToolsSchemallm.register_function("summarize_conversation", summarize_conversation)summarize_function = FunctionSchema( name="summarize_conversation", description=( "Summarize and compress the conversation history. " "Call this when the user asks you to summarize the conversation " "or when you want to free up context space." ), properties={}, required=[],)tools = ToolsSchema(standard_tools=[summarize_function])context = LLMContext(messages, tools=tools)
On-demand summarization works even when enable_auto_context_summarization is False — the summarizer is always created internally to handle manually pushed frames. You can also pass a per-request LLMContextSummaryConfig to override the default settings: