Claude Extended Thinking: Deep Reasoning in Practice
By Learnia Team
Claude Extended Thinking: Deep Reasoning in Practice
📅 Last updated: March 10, 2026 — Covers classic Extended Thinking and Claude 4.6 adaptive thinking.
🔗 Pillar article: Claude API: Complete Guide
What is Extended Thinking?
Extended Thinking is a feature that allows Claude to think deeply before responding. Instead of immediately generating a response, Claude:
- →Explores different approaches to the problem
- →Verifies its reasoning step by step
- →Corrects its errors along the way
- →Formulates a more reliable final response
It's the equivalent of asking an expert to take time to think rather than answering instantly.
Without vs With Extended Thinking
| Aspect | Without Extended Thinking | With Extended Thinking |
|---|---|---|
| Response | Immediate, direct | Thoughtful, structured |
| Mathematics | Frequent errors on complex problems | Methodical solving |
| Code | Possible subtle bugs | Step-by-step verification |
| Analysis | Superficial | Multi-perspective |
| Latency | Low | Higher (proportional to budget) |
| Cost | Standard | + thinking tokens |
Enabling Extended Thinking via the API
Basic Implementation
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Maximum budget for thinking
},
messages=[{
"role": "user",
"content": "Solve this dynamic programming problem: given an array of integers, find the longest increasing subsequence."
}]
)
# Iterate through response blocks
for block in response.content:
if block.type == "thinking":
print("🧠 Thinking:")
print(block.thinking)
print("---")
elif block.type == "text":
print("💬 Response:")
print(block.text)
Adaptive Thinking (Claude 4.6)
Claude 4.6's adaptive thinking automatically adjusts reasoning effort based on complexity:
response = client.messages.create(
model="claude-opus-4-20250918",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # MAXIMUM budget, not fixed
},
messages=[{
"role": "user",
"content": "What is the capital of France?"
}]
)
# → Minimal thinking (~50 tokens) because the question is simple
response2 = client.messages.create(
model="claude-opus-4-20250918",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[{
"role": "user",
"content": "Prove that the square root of 2 is irrational."
}]
)
# → Extended thinking (~5000 tokens) because the problem is complex
Budget Tokens: Optimizing Cost
The budget_tokens defines the maximum number of tokens Claude can use for thinking.
| Budget | Best for | Added latency |
|---|---|---|
| 1,000 - 3,000 | Easy questions, clarifications | < 2s |
| 3,000 - 8,000 | Moderate reasoning, code debugging | 2-5s |
| 8,000 - 15,000 | Complex mathematics, architecture | 5-15s |
| 15,000 - 32,000 | Very complex problems, proofs | 15-30s |
Allocation Strategy
def get_thinking_budget(task_type):
"""Returns a thinking budget suited to the task type."""
budgets = {
"simple_qa": 1000,
"code_review": 5000,
"bug_fix": 8000,
"algorithm_design": 12000,
"math_proof": 20000,
"architecture": 15000,
}
return budgets.get(task_type, 5000)
# Usage
budget = get_thinking_budget("bug_fix")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": budget},
messages=[{"role": "user", "content": "..."}]
)
Important Constraints
- →Minimum:
budget_tokensmust be ≥ 1024 - →Maximum:
budget_tokens+max_tokensmust not exceed the model's limit - →Billing: Thinking tokens are billed at the output token rate
- →No cache: Thinking tokens are not eligible for prompt caching
When to Use Extended Thinking?
✅ Use It
| Use case | Example | Expected gain |
|---|---|---|
| Mathematics | Solving equations, proving theorems | +40% accuracy |
| Programming | Designing algorithms, debugging complex code | +35% accuracy |
| Logical analysis | Detecting flaws in an argument | +30% accuracy |
| Planning | Creating a technical migration plan | Better coverage |
| Structured writing | Writing an RFC or specification | More coherent |
❌ Avoid It
- →Simple factual questions: "What is the capital of France?" → Not needed
- →Free creative tasks: "Write a poem" → Thinking doesn't improve creativity
- →Low-latency chatbots: Thinking adds latency
- →Tight budget: Thinking tokens are billed and add up quickly
Streaming with Extended Thinking
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 8000},
messages=[{"role": "user", "content": "Design a database for a social network."}]
) as stream:
current_type = None
for event in stream:
if hasattr(event, 'type'):
if event.type == "content_block_start":
block_type = event.content_block.type
if block_type == "thinking":
print("\n🧠 Thinking in progress...")
current_type = "thinking"
elif block_type == "text":
print("\n💬 Response:")
current_type = "text"
elif event.type == "content_block_delta":
if current_type == "thinking" and hasattr(event.delta, 'thinking'):
print(event.delta.thinking, end="")
elif current_type == "text" and hasattr(event.delta, 'text'):
print(event.delta.text, end="")
Block Order in Streaming
1. message_start
2. content_block_start (type: "thinking")
3. content_block_delta (thinking content...) ← thinking blocks
4. content_block_stop
5. content_block_start (type: "text")
6. content_block_delta (text content...) ← final response
7. content_block_stop
8. message_stop
Performance Benchmarks
Extended Thinking significantly improves performance on reasoning benchmarks:
| Benchmark | Without Thinking | With Thinking | Improvement |
|---|---|---|---|
| MATH (level 5) | 71.2% | 93.4% | +22.2% |
| GPQA (Diamond) | 65.0% | 81.3% | +16.3% |
| SWE-bench | 38.2% | 52.1% | +13.9% |
| HumanEval | 88.5% | 95.7% | +7.2% |
| ARC-Challenge | 89.1% | 96.8% | +7.7% |
Advanced Patterns
Multi-Turn Conversation with Thinking
messages = []
def chat_with_thinking(user_msg, budget=5000):
messages.append({"role": "user", "content": user_msg})
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": budget},
messages=messages
)
# Add the complete response (without thinking blocks)
assistant_content = [
block for block in response.content
if block.type == "text"
]
messages.append({"role": "assistant", "content": assistant_content})
return response
# Usage
chat_with_thinking("Design a recommendation algorithm.", budget=10000)
chat_with_thinking("Optimize it for scalability.", budget=8000)
Dynamic Budget Based on Complexity
def estimate_complexity(message):
"""Estimates question complexity to adjust the budget."""
complexity_indicators = {
"prove": 3, "demonstrate": 3, "optimize": 2,
"algorithm": 2, "architecture": 2, "compare": 1,
"design": 2, "debug": 2, "why": 1,
"analyze": 1, "explain": 0.5
}
score = sum(
weight for keyword, weight in complexity_indicators.items()
if keyword in message.lower()
)
if score <= 1:
return 2000
elif score <= 3:
return 6000
elif score <= 5:
return 12000
else:
return 20000
budget = estimate_complexity("Prove and demonstrate that this algorithm is optimal.")
# → 20000 tokens (high complexity)
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Budget too small | budget_tokens < 1024 | Minimum 1024 tokens |
| Truncated response | max_tokens too low after thinking | Increase max_tokens |
| Unexpected high cost | Thinking on every request | Enable thinking only when necessary |
| Empty thinking | Question too simple | Adaptive thinking may skip reasoning |
Module 0 — Prompting Fundamentals
Build your first effective prompts from scratch with hands-on exercises.
Weekly AI Insights
Tools, techniques & news — curated for AI practitioners. Free, no spam.
Free, no spam. Unsubscribe anytime.
→Related Articles
FAQ
What is Claude's Extended Thinking?+
Extended Thinking allows Claude to 'think' before responding, exploring different approaches and verifying its reasoning. Thinking blocks are visible in the API response but are not billed as standard output tokens.
When should I use Extended Thinking?+
Use it for complex problems: advanced mathematics, multi-step reasoning, code analysis, planning, bug resolution, and any task requiring deep thought.
What is the cost of Extended Thinking?+
Thinking tokens are billed at the same rate as output tokens. However, Claude 4.6's adaptive thinking automatically optimizes the budget by using only the tokens needed.
Does Extended Thinking work with streaming?+
Yes, Extended Thinking is compatible with streaming. You first receive the thinking blocks, then the final response content. You can choose to show or hide thinking blocks on the client side.
What is the difference between Extended Thinking and Claude 4.6 adaptive thinking?+
Classic Extended Thinking uses a fixed budget. Claude 4.6's adaptive thinking automatically adjusts the reasoning effort based on question complexity, optimizing cost and latency.
How do I disable extended thinking in Claude?+
In the API, don't set the 'thinking' parameter or set it to 'disabled'. In the claude.ai web interface, extended thinking activates automatically based on question complexity — you don't need to manually disable it. In Claude Code, use the --no-thinking flag to force quick responses.
Which Claude model is best for reasoning?+
Opus 4.6 with extended thinking enabled is the best Claude model for complex reasoning (math, logic, system architecture). For standard reasoning, Sonnet 4.6 with its adaptive thinking offers excellent value — it automatically adjusts its reasoning effort based on difficulty.