Claude Extended Thinking: Deep Reasoning in Practice
By Dorian Laurenceau
๐ Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.
๐ Pillar article: Claude API: Complete Guide
What is Extended Thinking?
Extended Thinking is a feature that allows Claude to think deeply before responding. Instead of immediately generating a response, Claude:
- โExplores different approaches to the problem
- โVerifies its reasoning step by step
- โCorrects its errors along the way
- โFormulates a more reliable final response
It's the equivalent of asking an expert to take time to think rather than answering instantly.
Without vs With Extended Thinking
| Aspect | Without Extended Thinking | With Extended Thinking |
|---|---|---|
| Response | Immediate, direct | Thoughtful, structured |
| Mathematics | Frequent errors on complex problems | Methodical solving |
| Code | Possible subtle bugs | Step-by-step verification |
| Analysis | Superficial | Multi-perspective |
| Latency | Low | Higher (proportional to budget) |
| Cost | Standard | + thinking tokens |
The honest state of "extended thinking" features across frontier models โ Anthropic's extended thinking, OpenAI's o-series, DeepSeek's reasoning mode โ is that they trade latency and cost for a real but narrow quality gain. Practitioners on r/MachineLearning and r/LocalLLaMA keep landing on the same profile: math, logical multi-step reasoning, and code with non-obvious constraints benefit clearly; rephrasing, summarization, and single-hop Q&A do not. Turning on extended thinking for every task is a waste of tokens; turning it off for the tasks that actually need it is a waste of accuracy.
Where the community correctly pushes back on the marketing: "thinking" tokens are not observable reasoning in any philosophical sense โ they are an extended scratchpad that statistically correlates with better final answers on reasoning-heavy tasks. Research like Chain-of-Thought prompting and the subsequent debates about whether CoT faithfully represents model cognition are the right context here. The short version: assume the thinking trace is useful for you as a debugging tool, not as a literal transcript of the model's beliefs.
Pragmatic operating rule: benchmark your own workload with and without extended thinking on a fixed eval set, then adopt it only where the accuracy gain exceeds the latency and token cost. For most production workflows, a small budget (1-4k thinking tokens) on the 20% of hardest tasks pays for itself; blanket enabling rarely does.
Enabling Extended Thinking via the API
Basic Implementation
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # Maximum budget for thinking
},
messages=[{
"role": "user",
"content": "Solve this dynamic programming problem: given an array of integers, find the longest increasing subsequence."
}]
)
# Iterate through response blocks
for block in response.content:
if block.type == "thinking":
print("๐ง Thinking:")
print(block.thinking)
print("---")
elif block.type == "text":
print("๐ฌ Response:")
print(block.text)
Adaptive Thinking (Claude 4.6)
Claude 4.6's adaptive thinking automatically adjusts reasoning effort based on complexity:
response = client.messages.create(
model="claude-opus-4-20250918",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # MAXIMUM budget, not fixed
},
messages=[{
"role": "user",
"content": "What is the capital of France?"
}]
)
# โ Minimal thinking (~50 tokens) because the question is simple
response2 = client.messages.create(
model="claude-opus-4-20250918",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000
},
messages=[{
"role": "user",
"content": "Prove that the square root of 2 is irrational."
}]
)
# โ Extended thinking (~5000 tokens) because the problem is complex
Budget Tokens: Optimizing Cost
The budget_tokens defines the maximum number of tokens Claude can use for thinking.
| Budget | Best for | Added latency |
|---|---|---|
| 1,000 - 3,000 | Easy questions, clarifications | < 2s |
| 3,000 - 8,000 | Moderate reasoning, code debugging | 2-5s |
| 8,000 - 15,000 | Complex mathematics, architecture | 5-15s |
| 15,000 - 32,000 | Very complex problems, proofs | 15-30s |
Allocation Strategy
def get_thinking_budget(task_type):
"""Returns a thinking budget suited to the task type."""
budgets = {
"simple_qa": 1000,
"code_review": 5000,
"bug_fix": 8000,
"algorithm_design": 12000,
"math_proof": 20000,
"architecture": 15000,
}
return budgets.get(task_type, 5000)
# Usage
budget = get_thinking_budget("bug_fix")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": budget},
messages=[{"role": "user", "content": "..."}]
)
Important Constraints
- โMinimum:
budget_tokensmust be โฅ 1024 - โMaximum:
budget_tokens+max_tokensmust not exceed the model's limit - โBilling: Thinking tokens are billed at the output token rate
- โNo cache: Thinking tokens are not eligible for prompt caching
When to Use Extended Thinking?
โ Use It
| Use case | Example | Expected gain |
|---|---|---|
| Mathematics | Solving equations, proving theorems | +40% accuracy |
| Programming | Designing algorithms, debugging complex code | +35% accuracy |
| Logical analysis | Detecting flaws in an argument | +30% accuracy |
| Planning | Creating a technical migration plan | Better coverage |
| Structured writing | Writing an RFC or specification | More coherent |
โ Avoid It
- โSimple factual questions: "What is the capital of France?" โ Not needed
- โFree creative tasks: "Write a poem" โ Thinking doesn't improve creativity
- โLow-latency chatbots: Thinking adds latency
- โTight budget: Thinking tokens are billed and add up quickly
Streaming with Extended Thinking
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 8000},
messages=[{"role": "user", "content": "Design a database for a social network."}]
) as stream:
current_type = None
for event in stream:
if hasattr(event, 'type'):
if event.type == "content_block_start":
block_type = event.content_block.type
if block_type == "thinking":
print("\n๐ง Thinking in progress...")
current_type = "thinking"
elif block_type == "text":
print("\n๐ฌ Response:")
current_type = "text"
elif event.type == "content_block_delta":
if current_type == "thinking" and hasattr(event.delta, 'thinking'):
print(event.delta.thinking, end="")
elif current_type == "text" and hasattr(event.delta, 'text'):
print(event.delta.text, end="")
Block Order in Streaming
1. message_start
2. content_block_start (type: "thinking")
3. content_block_delta (thinking content...) โ thinking blocks
4. content_block_stop
5. content_block_start (type: "text")
6. content_block_delta (text content...) โ final response
7. content_block_stop
8. message_stop
Performance Benchmarks
Extended Thinking significantly improves performance on reasoning benchmarks:
| Benchmark | Without Thinking | With Thinking | Improvement |
|---|---|---|---|
| MATH (level 5) | 71.2% | 93.4% | +22.2% |
| GPQA (Diamond) | 65.0% | 81.3% | +16.3% |
| SWE-bench | 38.2% | 52.1% | +13.9% |
| HumanEval | 88.5% | 95.7% | +7.2% |
| ARC-Challenge | 89.1% | 96.8% | +7.7% |
Advanced Patterns
Multi-Turn Conversation with Thinking
messages = []
def chat_with_thinking(user_msg, budget=5000):
messages.append({"role": "user", "content": user_msg})
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": budget},
messages=messages
)
# Add the complete response (without thinking blocks)
assistant_content = [
block for block in response.content
if block.type == "text"
]
messages.append({"role": "assistant", "content": assistant_content})
return response
# Usage
chat_with_thinking("Design a recommendation algorithm.", budget=10000)
chat_with_thinking("Optimize it for scalability.", budget=8000)
Dynamic Budget Based on Complexity
def estimate_complexity(message):
"""Estimates question complexity to adjust the budget."""
complexity_indicators = {
"prove": 3, "demonstrate": 3, "optimize": 2,
"algorithm": 2, "architecture": 2, "compare": 1,
"design": 2, "debug": 2, "why": 1,
"analyze": 1, "explain": 0.5
}
score = sum(
weight for keyword, weight in complexity_indicators.items()
if keyword in message.lower()
)
if score <= 1:
return 2000
elif score <= 3:
return 6000
elif score <= 5:
return 12000
else:
return 20000
budget = estimate_complexity("Prove and demonstrate that this algorithm is optimal.")
# โ 20000 tokens (high complexity)
Common Errors
| Error | Cause | Solution |
|---|---|---|
| Budget too small | budget_tokens < 1024 | Minimum 1024 tokens |
| Truncated response | max_tokens too low after thinking | Increase max_tokens |
| Unexpected high cost | Thinking on every request | Enable thinking only when necessary |
| Empty thinking | Question too simple | Adaptive thinking may skip reasoning |
Module 0 โ Prompting Fundamentals
Build your first effective prompts from scratch with hands-on exercises.
Dorian Laurenceau
Full-Stack Developer & Learning DesignerFull-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.
Weekly AI Insights
Tools, techniques & news โ curated for AI practitioners. Free, no spam.
Free, no spam. Unsubscribe anytime.
โRelated Articles
FAQ
What is Claude's Extended Thinking?+
Extended Thinking allows Claude to 'think' before responding, exploring different approaches and verifying its reasoning. Thinking blocks are visible in the API response but are not billed as standard output tokens.
When should I use Extended Thinking?+
Use it for complex problems: advanced mathematics, multi-step reasoning, code analysis, planning, bug resolution, and any task requiring deep thought.
What is the cost of Extended Thinking?+
Thinking tokens are billed at the same rate as output tokens. However, Claude 4.6's adaptive thinking automatically optimizes the budget by using only the tokens needed.
Does Extended Thinking work with streaming?+
Yes, Extended Thinking is compatible with streaming. You first receive the thinking blocks, then the final response content. You can choose to show or hide thinking blocks on the client side.
What is the difference between Extended Thinking and Claude 4.6 adaptive thinking?+
Classic Extended Thinking uses a fixed budget. Claude 4.6's adaptive thinking automatically adjusts the reasoning effort based on question complexity, optimizing cost and latency.
How do I disable extended thinking in Claude?+
In the API, don't set the 'thinking' parameter or set it to 'disabled'. In the claude.ai web interface, extended thinking activates automatically based on question complexity, you don't need to manually disable it. In Claude Code, use the --no-thinking flag to force quick responses.
Which Claude model is best for reasoning?+
Opus 4.6 with extended thinking enabled is the best Claude model for complex reasoning (math, logic, system architecture). For standard reasoning, Sonnet 4.6 with its adaptive thinking offers excellent value โ it automatically adjusts its reasoning effort based on difficulty.