Automatic Call Summarization with Claude: Practical Guide
By Learnia AI Research Team
Automatic Call Summarization with Claude: Practical Guide
📅 Last updated: March 14, 2026 — Based on Claude 3.5 Sonnet and the Anthropic API.
📚 Related articles: Extended Thinking Guide | Prompt Chaining Pipelines | Prompt Engineering Process | Structured Outputs
Every day, millions of calls are made in call centers, sales teams, and corporate meetings. Manual note-taking is time-consuming, incomplete, and inconsistent. This guide shows you how to build an automatic call summarization pipeline with Claude, from raw transcript to CRM-ready structured summary.
Why Automatic Call Summarization?
The use cases are massive:
- →Call centers: each agent handles 40-60 calls/day. Manual post-processing represents 20-30% of work time.
- →Sales teams: incomplete sales notes lose context between follow-ups.
- →Corporate meetings: verbal decisions are often forgotten without structured written records.
Transcript Pre-processing
Before sending a transcript to Claude, cleanup is essential. Raw transcripts from ASR (Automatic Speech Recognition) contain noise: hesitations, repetitions, diarization errors.
Cleanup and normalization
import re
from dataclasses import dataclass
@dataclass
class TranscriptSegment:
speaker: str
text: str
start_time: float
end_time: float
def clean_transcript(segments: list[TranscriptSegment]) -> list[TranscriptSegment]:
"""Clean a raw ASR transcript."""
cleaned = []
for seg in segments:
text = seg.text.strip()
# Remove common hesitations
text = re.sub(r'\b(um|uh|ah|oh|like|you know)\b', '', text, flags=re.IGNORECASE)
# Remove consecutive repetitions
text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text)
# Normalize whitespace
text = re.sub(r'\s+', ' ', text).strip()
if text:
cleaned.append(TranscriptSegment(
speaker=seg.speaker,
text=text,
start_time=seg.start_time,
end_time=seg.end_time
))
return cleaned
def merge_consecutive_speaker(segments: list[TranscriptSegment]) -> list[TranscriptSegment]:
"""Merge consecutive segments from the same speaker."""
if not segments:
return []
merged = [segments[0]]
for seg in segments[1:]:
if seg.speaker == merged[-1].speaker:
merged[-1].text += " " + seg.text
merged[-1].end_time = seg.end_time
else:
merged.append(seg)
return merged
Multi-speaker diarization handling
When diarization is uncertain (misidentified speakers), Claude can help correct it:
import anthropic
client = anthropic.Anthropic()
def fix_diarization(transcript_text: str) -> str:
"""Use Claude to correct diarization errors."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": f"""Analyze this transcript and correct diarization errors
(wrong speaker attribution).
Clues to identify errors:
- A speaker responding to themselves
- Inconsistent tone/topic change for the same speaker
- Politeness formulas attributed to the wrong person
Transcript:
{transcript_text}
Return the corrected transcript in the same format."""
}]
)
return response.content[0].text
Prompt Design for Structured Summarization
The core of the system is the prompt that transforms a transcript into a structured summary. The key: an explicit output template adapted to the call type.
Generic template (meeting / call)
SUMMARIZATION_PROMPT = """You are an assistant specialized in summarizing
professional calls. Produce a structured summary faithful to the transcript.
RULES:
- Never invent information absent from the transcript
- Attribute each decision/action to the correct speaker
- Distinguish firm decisions from exploratory discussions
- Use the output format EXACTLY as specified
OUTPUT FORMAT:
## Executive Summary
[2-3 sentences summarizing the objective and main outcome of the call]
## Participants
- [Name/Role]: [Main contribution]
## Key Points Discussed
1. [Topic] — [Conclusion or status]
2. ...
## Decisions Made
- [Decision] (by [Speaker], at [timestamp if available])
## Action Items
| Responsible | Action | Deadline | Priority |
|-------------|--------|----------|----------|
| [Name] | [Description] | [Date/Timeframe] | High/Medium/Low |
## Next Steps
- [Step] — [Responsible] — [Target date]
## Overall Tone and Sentiment
[1 sentence about the general atmosphere of the call]
---
TRANSCRIPT:
{transcript}"""
BANT template for sales calls
SALES_CALL_PROMPT = """You are an AI sales analyst. Summarize this sales call
in BANT format for the CRM.
OUTPUT FORMAT:
## Sales Call Summary
### Client Information
- Company: [name]
- Contact: [name, role]
- Industry: [sector]
### BANT Qualification
- **Budget**: [Amount mentioned or "Not discussed"]
- **Authority**: [Decision-maker identified? Who?]
- **Need**: [Primary need expressed]
- **Timeline**: [Deadline mentioned or "Not defined"]
### Qualification Score
[1-10 with justification]
### Objections Raised
1. [Objection] → [Response provided]
### Sales Action Items
| Action | Responsible | Deadline |
|--------|-------------|----------|
| [Description] | [Seller/Client] | [Date] |
### Recommended Next Step
[Recommendation based on BANT analysis]
---
TRANSCRIPT:
{transcript}"""
API Call and Structured Extraction
Here is the complete implementation to send a transcript and retrieve a structured summary:
import anthropic
import json
client = anthropic.Anthropic()
def summarize_call(transcript: str, call_type: str = "generic") -> dict:
"""Summarize a call and return a structured summary."""
prompts = {
"generic": SUMMARIZATION_PROMPT,
"sales": SALES_CALL_PROMPT,
}
prompt_template = prompts.get(call_type, SUMMARIZATION_PROMPT)
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": prompt_template.format(transcript=transcript)
}]
)
summary_text = response.content[0].text
# Extract action items as structured JSON
action_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"""Extract the action items from this summary as strict JSON.
Summary:
{summary_text}
Return ONLY valid JSON in this format:
{{
"action_items": [
{{
"responsible": "Name",
"action": "Description",
"deadline": "Date or null",
"priority": "high|medium|low"
}}
],
"decisions": [
{{
"decision": "Description",
"made_by": "Name",
"timestamp": "Moment in the call or null"
}}
],
"follow_ups": [
{{
"description": "Description",
"responsible": "Name",
"target_date": "Date or null"
}}
]
}}"""
}]
)
structured_data = json.loads(action_response.content[0].text)
return {
"summary": summary_text,
"structured": structured_data,
"model": "claude-sonnet-4-20250514",
"call_type": call_type
}
For more on reliable JSON extraction, see our Structured Outputs guide.
Handling Long Calls: Chunking + Map-Reduce
Calls over 30 minutes often exceed the optimal context window. The solution: a map-reduce pattern inspired by prompt chaining pipelines.
Chunking implementation with overlap
def chunk_transcript(
segments: list[TranscriptSegment],
chunk_duration_minutes: float = 15.0,
overlap_minutes: float = 3.0
) -> list[list[TranscriptSegment]]:
"""Split a transcript into chunks with overlap."""
chunk_duration = chunk_duration_minutes * 60
overlap = overlap_minutes * 60
if not segments:
return []
total_duration = segments[-1].end_time - segments[0].start_time
chunks = []
start_time = segments[0].start_time
while start_time < segments[-1].end_time:
end_time = start_time + chunk_duration
chunk = [s for s in segments
if s.start_time >= start_time and s.start_time < end_time]
if chunk:
chunks.append(chunk)
start_time += chunk_duration - overlap
return chunks
def format_chunk(segments: list[TranscriptSegment]) -> str:
"""Format a chunk for sending to Claude."""
lines = []
for seg in segments:
minutes = int(seg.start_time // 60)
seconds = int(seg.start_time % 60)
lines.append(f"[{minutes:02d}:{seconds:02d}] {seg.speaker}: {seg.text}")
return "\n".join(lines)
Map-Reduce: parallel summarization then merge
import asyncio
async def summarize_chunk(client, chunk_text: str, chunk_index: int,
total_chunks: int) -> str:
"""Summarize an individual chunk (MAP phase)."""
response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"""Summarize this call segment (part {chunk_index + 1}/{total_chunks}).
IMPORTANT: This is only a PART of the call. Do not conclude
prematurely. Note ongoing topics at the end of the segment.
Extract:
- Key points discussed
- Decisions made (with speaker)
- Action items identified
- Ongoing / unresolved topics
Segment:
{chunk_text}"""
}]
)
return response.content[0].text
async def merge_summaries(client, partial_summaries: list[str],
call_type: str = "generic") -> str:
"""Merge partial summaries into final summary (REDUCE phase)."""
summaries_text = "\n\n---\n\n".join(
[f"### Part {i+1}\n{s}" for i, s in enumerate(partial_summaries)]
)
response = await client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": f"""Merge these partial summaries into ONE coherent final summary.
MERGE RULES:
- Deduplicate information present in overlap zones
- Maintain chronological order
- Consolidate action items (one item = one entry, even if it appears
in multiple parts)
- Resolve "ongoing" topics with their conclusion in subsequent parts
Partial summaries:
{summaries_text}
Produce the final summary in the standard structured format."""
}]
)
return response.content[0].text
async def summarize_long_call(transcript_segments: list[TranscriptSegment],
call_type: str = "generic") -> dict:
"""Complete pipeline for long calls."""
client = anthropic.AsyncAnthropic()
# Chunking
chunks = chunk_transcript(transcript_segments)
chunk_texts = [format_chunk(c) for c in chunks]
# MAP phase: parallel summaries
tasks = [
summarize_chunk(client, text, i, len(chunk_texts))
for i, text in enumerate(chunk_texts)
]
partial_summaries = await asyncio.gather(*tasks)
# REDUCE phase: merge
final_summary = await merge_summaries(client, partial_summaries, call_type)
return {
"summary": final_summary,
"chunks_processed": len(chunks),
"method": "map-reduce"
}
Summary Quality Evaluation
A good summary must be faithful, complete, and concise. Here is an automatic evaluation system.
Automated evaluation with Claude-as-a-Judge
def evaluate_summary(transcript: str, summary: str) -> dict:
"""Evaluate a summary on 3 dimensions: faithfulness, completeness, conciseness."""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"""Evaluate this call summary on 3 dimensions.
For each dimension, provide a score from 0.0 to 1.0 and a justification.
## Dimensions
1. **Faithfulness**: Is each claim in the summary verifiable in the
transcript? (No hallucination)
2. **Completeness**: Are all key points, decisions, and actions from
the transcript present in the summary?
3. **Conciseness**: Is the summary sufficiently condensed without
redundancy or superfluous details?
## Transcript (source of truth)
{transcript[:3000]}
## Summary to evaluate
{summary}
Return ONLY valid JSON:
{{
"faithfulness": {{"score": 0.0, "issues": []}},
"completeness": {{"score": 0.0, "missing": []}},
"conciseness": {{"score": 0.0, "redundancies": []}},
"overall_score": 0.0,
"pass": true
}}"""
}]
)
return json.loads(response.content[0].text)
Re-generation loop if quality is insufficient
async def summarize_with_quality_check(
transcript: str,
call_type: str = "generic",
max_retries: int = 2,
quality_threshold: float = 0.8
) -> dict:
"""Summarize with quality check and re-generation if needed."""
for attempt in range(max_retries + 1):
result = summarize_call(transcript, call_type)
evaluation = evaluate_summary(transcript, result["summary"])
if evaluation["overall_score"] >= quality_threshold:
return {
**result,
"quality": evaluation,
"attempts": attempt + 1
}
# Feedback to improve the next attempt
if evaluation["completeness"]["missing"]:
transcript = f"""[ADDITIONAL INSTRUCTION]
Points missed in the previous attempt:
{', '.join(evaluation['completeness']['missing'])}
Make sure to include these elements.
{transcript}"""
# Return last result even if below threshold
return {**result, "quality": evaluation, "attempts": max_retries + 1}
Case Study: Sales Call → CRM Summary
Complete Production Pipeline
Here is the final pipeline assembly, integrating all steps:
import anthropic
import asyncio
from dataclasses import asdict
async def production_pipeline(
raw_segments: list[TranscriptSegment],
call_type: str = "generic",
call_metadata: dict = None
) -> dict:
"""Production pipeline for call summarization."""
# 1. Pre-processing
cleaned = clean_transcript(raw_segments)
merged = merge_consecutive_speaker(cleaned)
# 2. Determine strategy (direct or map-reduce)
total_duration = merged[-1].end_time - merged[0].start_time
if total_duration > 30 * 60: # > 30 minutes
result = await summarize_long_call(merged, call_type)
else:
transcript_text = format_chunk(merged)
result = summarize_call(transcript_text, call_type)
# 3. Quality evaluation
transcript_for_eval = format_chunk(merged[:50]) # First segments
evaluation = evaluate_summary(transcript_for_eval, result["summary"])
return {
"call_metadata": call_metadata,
"summary": result["summary"],
"structured_data": result.get("structured", {}),
"quality": evaluation,
"processing": {
"method": result.get("method", "direct"),
"segments_processed": len(merged),
"duration_minutes": round(total_duration / 60, 1)
}
}
For more advanced pipeline architectures, see our prompt chaining guide. For using Extended Thinking on complex transcripts, see the Extended Thinking guide.
FAQ
How does Claude handle very long transcripts (over one hour)?
For transcripts exceeding the context window, we use a map-reduce pattern: the transcript is split into 10-15 minute chunks with overlap, each chunk is summarized independently, then partial summaries are merged into a coherent final summary with action extraction.
How accurate is Claude's action item extraction?
With a well-structured prompt and strict output format, Claude achieves 90-95% recall on explicit action items and 75-85% on implicit commitments. Accuracy improves significantly with few-shot examples in the prompt.
How do you handle transcripts with diarization errors (wrong speaker attribution)?
A pre-processing step corrects diarization inconsistencies by analyzing conversational context. Claude can identify when a speaker is misattributed based on semantic content and thematic transitions.
Can the summary format be customized per call type (sales, support, meeting)?
Yes, we use specialized output templates: BANT for sales calls, SOAP-like for technical support, and a decision/action format for meetings. The system prompt automatically selects the template via an initial classifier.
Weekly AI Insights
Tools, techniques & news — curated for AI practitioners. Free, no spam.
Free, no spam. Unsubscribe anytime.
→Related Articles
FAQ
How does Claude handle very long transcripts (over one hour)?+
For transcripts exceeding the context window, we use a map-reduce pattern: the transcript is split into 10-15 minute chunks with overlap, each chunk is summarized independently, then partial summaries are merged into a coherent final summary with action extraction.
How accurate is Claude's action item extraction?+
With a well-structured prompt and strict output format, Claude achieves 90-95% recall on explicit action items and 75-85% on implicit commitments. Accuracy improves significantly with few-shot examples in the prompt.
How do you handle transcripts with diarization errors (wrong speaker attribution)?+
A pre-processing step corrects diarization inconsistencies by analyzing conversational context. Claude can identify when a speaker is misattributed based on semantic content and thematic transitions.
Can the summary format be customized per call type (sales, support, meeting)?+
Yes, we use specialized output templates: BANT for sales calls, SOAP-like for technical support, and a decision/action format for meetings. The system prompt automatically selects the template via an initial classifier.