March 10, 20269 MIN READ

Claude Extended Thinking: Deep Reasoning in Practice

Q: What is Claude's Extended Thinking?

Extended Thinking allows Claude to 'think' before responding, exploring different approaches and verifying its reasoning. Thinking blocks are visible in the API response but are not billed as standard output tokens.

Q: When should I use Extended Thinking?

Use it for complex problems: advanced mathematics, multi-step reasoning, code analysis, planning, bug resolution, and any task requiring deep thought.

Q: What is the cost of Extended Thinking?

Thinking tokens are billed at the same rate as output tokens. However, Claude 4.6's adaptive thinking automatically optimizes the budget by using only the tokens needed.

Q: Does Extended Thinking work with streaming?

Yes, Extended Thinking is compatible with streaming. You first receive the thinking blocks, then the final response content. You can choose to show or hide thinking blocks on the client side.

Q: What is the difference between Extended Thinking and Claude 4.6 adaptive thinking?

Classic Extended Thinking uses a fixed budget. Claude 4.6's adaptive thinking automatically adjusts the reasoning effort based on question complexity, optimizing cost and latency.

Q: How do I disable extended thinking in Claude?

In the API, don't set the 'thinking' parameter or set it to 'disabled'. In the claude.ai web interface, extended thinking activates automatically based on question complexity, you don't need to manually disable it. In Claude Code, use the --no-thinking flag to force quick responses.

Q: Which Claude model is best for reasoning?

Opus 4.6 with extended thinking enabled is the best Claude model for complex reasoning (math, logic, system architecture). For standard reasoning, Sonnet 4.6 with its adaptive thinking offers excellent value — it automatically adjusts its reasoning effort based on difficulty.

By Dorian Laurenceau

Part ofModule 0 — Prompting Fundamentals→

📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.

🔗 Pillar article: Claude API: Complete Guide

What is Extended Thinking?

Extended Thinking is a feature that allows Claude to think deeply before responding. Instead of immediately generating a response, Claude:

→Explores different approaches to the problem
→Verifies its reasoning step by step
→Corrects its errors along the way
→Formulates a more reliable final response

It's the equivalent of asking an expert to take time to think rather than answering instantly.

Without vs With Extended Thinking

Aspect	Without Extended Thinking	With Extended Thinking
Response	Immediate, direct	Thoughtful, structured
Mathematics	Frequent errors on complex problems	Methodical solving
Code	Possible subtle bugs	Step-by-step verification
Analysis	Superficial	Multi-perspective
Latency	Low	Higher (proportional to budget)
Cost	Standard	+ thinking tokens

The honest state of "extended thinking" features across frontier models — Anthropic's extended thinking, OpenAI's o-series, DeepSeek's reasoning mode — is that they trade latency and cost for a real but narrow quality gain. Practitioners on r/MachineLearning and r/LocalLLaMA keep landing on the same profile: math, logical multi-step reasoning, and code with non-obvious constraints benefit clearly; rephrasing, summarization, and single-hop Q&A do not. Turning on extended thinking for every task is a waste of tokens; turning it off for the tasks that actually need it is a waste of accuracy.

Where the community correctly pushes back on the marketing: "thinking" tokens are not observable reasoning in any philosophical sense — they are an extended scratchpad that statistically correlates with better final answers on reasoning-heavy tasks. Research like Chain-of-Thought prompting and the subsequent debates about whether CoT faithfully represents model cognition are the right context here. The short version: assume the thinking trace is useful for you as a debugging tool, not as a literal transcript of the model's beliefs.

Pragmatic operating rule: benchmark your own workload with and without extended thinking on a fixed eval set, then adopt it only where the accuracy gain exceeds the latency and token cost. For most production workflows, a small budget (1-4k thinking tokens) on the 20% of hardest tasks pays for itself; blanket enabling rarely does.

Enabling Extended Thinking via the API

Basic Implementation

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Maximum budget for thinking
    },
    messages=[{
        "role": "user",
        "content": "Solve this dynamic programming problem: given an array of integers, find the longest increasing subsequence."
    }]
)

# Iterate through response blocks
for block in response.content:
    if block.type == "thinking":
        print("🧠 Thinking:")
        print(block.thinking)
        print("---")
    elif block.type == "text":
        print("💬 Response:")
        print(block.text)

Adaptive Thinking (Claude 4.6)

Claude 4.6's adaptive thinking automatically adjusts reasoning effort based on complexity:

response = client.messages.create(
    model="claude-opus-4-20250918",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # MAXIMUM budget, not fixed
    },
    messages=[{
        "role": "user",
        "content": "What is the capital of France?"
    }]
)
# → Minimal thinking (~50 tokens) because the question is simple

response2 = client.messages.create(
    model="claude-opus-4-20250918",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{
        "role": "user",
        "content": "Prove that the square root of 2 is irrational."
    }]
)
# → Extended thinking (~5000 tokens) because the problem is complex

Budget Tokens: Optimizing Cost

The budget_tokens defines the maximum number of tokens Claude can use for thinking.

Budget	Best for	Added latency
1,000 - 3,000	Easy questions, clarifications	< 2s
3,000 - 8,000	Moderate reasoning, code debugging	2-5s
8,000 - 15,000	Complex mathematics, architecture	5-15s
15,000 - 32,000	Very complex problems, proofs	15-30s

Allocation Strategy

def get_thinking_budget(task_type):
    """Returns a thinking budget suited to the task type."""
    budgets = {
        "simple_qa": 1000,
        "code_review": 5000,
        "bug_fix": 8000,
        "algorithm_design": 12000,
        "math_proof": 20000,
        "architecture": 15000,
    }
    return budgets.get(task_type, 5000)

# Usage
budget = get_thinking_budget("bug_fix")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": budget},
    messages=[{"role": "user", "content": "..."}]
)

Important Constraints

→Minimum: budget_tokens must be ≥ 1024
→Maximum: budget_tokens + max_tokens must not exceed the model's limit
→Billing: Thinking tokens are billed at the output token rate
→No cache: Thinking tokens are not eligible for prompt caching

When to Use Extended Thinking?

✅ Use It

Use case	Example	Expected gain
Mathematics	Solving equations, proving theorems	+40% accuracy
Programming	Designing algorithms, debugging complex code	+35% accuracy
Logical analysis	Detecting flaws in an argument	+30% accuracy
Planning	Creating a technical migration plan	Better coverage
Structured writing	Writing an RFC or specification	More coherent

❌ Avoid It

→Simple factual questions: "What is the capital of France?" → Not needed
→Free creative tasks: "Write a poem" → Thinking doesn't improve creativity
→Low-latency chatbots: Thinking adds latency
→Tight budget: Thinking tokens are billed and add up quickly

Streaming with Extended Thinking

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[{"role": "user", "content": "Design a database for a social network."}]
) as stream:
    current_type = None
    for event in stream:
        if hasattr(event, 'type'):
            if event.type == "content_block_start":
                block_type = event.content_block.type
                if block_type == "thinking":
                    print("\n🧠 Thinking in progress...")
                    current_type = "thinking"
                elif block_type == "text":
                    print("\n💬 Response:")
                    current_type = "text"
            elif event.type == "content_block_delta":
                if current_type == "thinking" and hasattr(event.delta, 'thinking'):
                    print(event.delta.thinking, end="")
                elif current_type == "text" and hasattr(event.delta, 'text'):
                    print(event.delta.text, end="")

Block Order in Streaming

1. message_start
2. content_block_start (type: "thinking")
3. content_block_delta (thinking content...)  ← thinking blocks
4. content_block_stop
5. content_block_start (type: "text")
6. content_block_delta (text content...)     ← final response
7. content_block_stop
8. message_stop

Performance Benchmarks

Extended Thinking significantly improves performance on reasoning benchmarks:

Benchmark	Without Thinking	With Thinking	Improvement
MATH (level 5)	71.2%	93.4%	+22.2%
GPQA (Diamond)	65.0%	81.3%	+16.3%
SWE-bench	38.2%	52.1%	+13.9%
HumanEval	88.5%	95.7%	+7.2%
ARC-Challenge	89.1%	96.8%	+7.7%

Advanced Patterns

Multi-Turn Conversation with Thinking

messages = []

def chat_with_thinking(user_msg, budget=5000):
    messages.append({"role": "user", "content": user_msg})
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=16000,
        thinking={"type": "enabled", "budget_tokens": budget},
        messages=messages
    )
    
    # Add the complete response (without thinking blocks)
    assistant_content = [
        block for block in response.content
        if block.type == "text"
    ]
    messages.append({"role": "assistant", "content": assistant_content})
    
    return response

# Usage
chat_with_thinking("Design a recommendation algorithm.", budget=10000)
chat_with_thinking("Optimize it for scalability.", budget=8000)

Dynamic Budget Based on Complexity

def estimate_complexity(message):
    """Estimates question complexity to adjust the budget."""
    complexity_indicators = {
        "prove": 3, "demonstrate": 3, "optimize": 2,
        "algorithm": 2, "architecture": 2, "compare": 1,
        "design": 2, "debug": 2, "why": 1,
        "analyze": 1, "explain": 0.5
    }
    
    score = sum(
        weight for keyword, weight in complexity_indicators.items()
        if keyword in message.lower()
    )
    
    if score <= 1:
        return 2000
    elif score <= 3:
        return 6000
    elif score <= 5:
        return 12000
    else:
        return 20000

budget = estimate_complexity("Prove and demonstrate that this algorithm is optimal.")
# → 20000 tokens (high complexity)

Common Errors

Error	Cause	Solution
Budget too small	`budget_tokens` < 1024	Minimum 1024 tokens
Truncated response	`max_tokens` too low after thinking	Increase `max_tokens`
Unexpected high cost	Thinking on every request	Enable thinking only when necessary
Empty thinking	Question too simple	Adaptive thinking may skip reasoning

GO DEEPER — FREE GUIDE

Module 0 — Prompting Fundamentals

Build your first effective prompts from scratch with hands-on exercises.

Explore the Module

Dorian Laurenceau

Full-Stack Developer & Learning Designer

Full-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.

Prompt EngineeringLLMsFull-Stack DevelopmentLearning DesignReact

Published: March 10, 2026Updated: April 24, 2026

Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What is Claude's Extended Thinking?+