Back to all articles
8 MIN READ

Claude Extended Thinking: Deep Reasoning in Practice

By Learnia Team

Claude Extended Thinking: Deep Reasoning in Practice

📅 Last updated: March 10, 2026 — Covers classic Extended Thinking and Claude 4.6 adaptive thinking.

🔗 Pillar article: Claude API: Complete Guide


What is Extended Thinking?

Extended Thinking is a feature that allows Claude to think deeply before responding. Instead of immediately generating a response, Claude:

  1. Explores different approaches to the problem
  2. Verifies its reasoning step by step
  3. Corrects its errors along the way
  4. Formulates a more reliable final response

It's the equivalent of asking an expert to take time to think rather than answering instantly.

Without vs With Extended Thinking

AspectWithout Extended ThinkingWith Extended Thinking
ResponseImmediate, directThoughtful, structured
MathematicsFrequent errors on complex problemsMethodical solving
CodePossible subtle bugsStep-by-step verification
AnalysisSuperficialMulti-perspective
LatencyLowHigher (proportional to budget)
CostStandard+ thinking tokens

Enabling Extended Thinking via the API

Basic Implementation

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Maximum budget for thinking
    },
    messages=[{
        "role": "user",
        "content": "Solve this dynamic programming problem: given an array of integers, find the longest increasing subsequence."
    }]
)

# Iterate through response blocks
for block in response.content:
    if block.type == "thinking":
        print("🧠 Thinking:")
        print(block.thinking)
        print("---")
    elif block.type == "text":
        print("💬 Response:")
        print(block.text)

Adaptive Thinking (Claude 4.6)

Claude 4.6's adaptive thinking automatically adjusts reasoning effort based on complexity:

response = client.messages.create(
    model="claude-opus-4-20250918",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # MAXIMUM budget, not fixed
    },
    messages=[{
        "role": "user",
        "content": "What is the capital of France?"
    }]
)
# → Minimal thinking (~50 tokens) because the question is simple

response2 = client.messages.create(
    model="claude-opus-4-20250918",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000
    },
    messages=[{
        "role": "user",
        "content": "Prove that the square root of 2 is irrational."
    }]
)
# → Extended thinking (~5000 tokens) because the problem is complex

Budget Tokens: Optimizing Cost

The budget_tokens defines the maximum number of tokens Claude can use for thinking.

BudgetBest forAdded latency
1,000 - 3,000Easy questions, clarifications< 2s
3,000 - 8,000Moderate reasoning, code debugging2-5s
8,000 - 15,000Complex mathematics, architecture5-15s
15,000 - 32,000Very complex problems, proofs15-30s

Allocation Strategy

def get_thinking_budget(task_type):
    """Returns a thinking budget suited to the task type."""
    budgets = {
        "simple_qa": 1000,
        "code_review": 5000,
        "bug_fix": 8000,
        "algorithm_design": 12000,
        "math_proof": 20000,
        "architecture": 15000,
    }
    return budgets.get(task_type, 5000)

# Usage
budget = get_thinking_budget("bug_fix")
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": budget},
    messages=[{"role": "user", "content": "..."}]
)

Important Constraints

  • Minimum: budget_tokens must be ≥ 1024
  • Maximum: budget_tokens + max_tokens must not exceed the model's limit
  • Billing: Thinking tokens are billed at the output token rate
  • No cache: Thinking tokens are not eligible for prompt caching

When to Use Extended Thinking?

✅ Use It

Use caseExampleExpected gain
MathematicsSolving equations, proving theorems+40% accuracy
ProgrammingDesigning algorithms, debugging complex code+35% accuracy
Logical analysisDetecting flaws in an argument+30% accuracy
PlanningCreating a technical migration planBetter coverage
Structured writingWriting an RFC or specificationMore coherent

❌ Avoid It

  • Simple factual questions: "What is the capital of France?" → Not needed
  • Free creative tasks: "Write a poem" → Thinking doesn't improve creativity
  • Low-latency chatbots: Thinking adds latency
  • Tight budget: Thinking tokens are billed and add up quickly

Streaming with Extended Thinking

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[{"role": "user", "content": "Design a database for a social network."}]
) as stream:
    current_type = None
    for event in stream:
        if hasattr(event, 'type'):
            if event.type == "content_block_start":
                block_type = event.content_block.type
                if block_type == "thinking":
                    print("\n🧠 Thinking in progress...")
                    current_type = "thinking"
                elif block_type == "text":
                    print("\n💬 Response:")
                    current_type = "text"
            elif event.type == "content_block_delta":
                if current_type == "thinking" and hasattr(event.delta, 'thinking'):
                    print(event.delta.thinking, end="")
                elif current_type == "text" and hasattr(event.delta, 'text'):
                    print(event.delta.text, end="")

Block Order in Streaming

1. message_start
2. content_block_start (type: "thinking")
3. content_block_delta (thinking content...)  ← thinking blocks
4. content_block_stop
5. content_block_start (type: "text")
6. content_block_delta (text content...)     ← final response
7. content_block_stop
8. message_stop

Performance Benchmarks

Extended Thinking significantly improves performance on reasoning benchmarks:

BenchmarkWithout ThinkingWith ThinkingImprovement
MATH (level 5)71.2%93.4%+22.2%
GPQA (Diamond)65.0%81.3%+16.3%
SWE-bench38.2%52.1%+13.9%
HumanEval88.5%95.7%+7.2%
ARC-Challenge89.1%96.8%+7.7%

Advanced Patterns

Multi-Turn Conversation with Thinking

messages = []

def chat_with_thinking(user_msg, budget=5000):
    messages.append({"role": "user", "content": user_msg})
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=16000,
        thinking={"type": "enabled", "budget_tokens": budget},
        messages=messages
    )
    
    # Add the complete response (without thinking blocks)
    assistant_content = [
        block for block in response.content
        if block.type == "text"
    ]
    messages.append({"role": "assistant", "content": assistant_content})
    
    return response

# Usage
chat_with_thinking("Design a recommendation algorithm.", budget=10000)
chat_with_thinking("Optimize it for scalability.", budget=8000)

Dynamic Budget Based on Complexity

def estimate_complexity(message):
    """Estimates question complexity to adjust the budget."""
    complexity_indicators = {
        "prove": 3, "demonstrate": 3, "optimize": 2,
        "algorithm": 2, "architecture": 2, "compare": 1,
        "design": 2, "debug": 2, "why": 1,
        "analyze": 1, "explain": 0.5
    }
    
    score = sum(
        weight for keyword, weight in complexity_indicators.items()
        if keyword in message.lower()
    )
    
    if score <= 1:
        return 2000
    elif score <= 3:
        return 6000
    elif score <= 5:
        return 12000
    else:
        return 20000

budget = estimate_complexity("Prove and demonstrate that this algorithm is optimal.")
# → 20000 tokens (high complexity)

Common Errors

ErrorCauseSolution
Budget too smallbudget_tokens < 1024Minimum 1024 tokens
Truncated responsemax_tokens too low after thinkingIncrease max_tokens
Unexpected high costThinking on every requestEnable thinking only when necessary
Empty thinkingQuestion too simpleAdaptive thinking may skip reasoning

GO DEEPER — FREE GUIDE

Module 0 — Prompting Fundamentals

Build your first effective prompts from scratch with hands-on exercises.

Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What is Claude's Extended Thinking?+

Extended Thinking allows Claude to 'think' before responding, exploring different approaches and verifying its reasoning. Thinking blocks are visible in the API response but are not billed as standard output tokens.

When should I use Extended Thinking?+

Use it for complex problems: advanced mathematics, multi-step reasoning, code analysis, planning, bug resolution, and any task requiring deep thought.

What is the cost of Extended Thinking?+

Thinking tokens are billed at the same rate as output tokens. However, Claude 4.6's adaptive thinking automatically optimizes the budget by using only the tokens needed.

Does Extended Thinking work with streaming?+

Yes, Extended Thinking is compatible with streaming. You first receive the thinking blocks, then the final response content. You can choose to show or hide thinking blocks on the client side.

What is the difference between Extended Thinking and Claude 4.6 adaptive thinking?+

Classic Extended Thinking uses a fixed budget. Claude 4.6's adaptive thinking automatically adjusts the reasoning effort based on question complexity, optimizing cost and latency.

How do I disable extended thinking in Claude?+

In the API, don't set the 'thinking' parameter or set it to 'disabled'. In the claude.ai web interface, extended thinking activates automatically based on question complexity — you don't need to manually disable it. In Claude Code, use the --no-thinking flag to force quick responses.

Which Claude model is best for reasoning?+

Opus 4.6 with extended thinking enabled is the best Claude model for complex reasoning (math, logic, system architecture). For standard reasoning, Sonnet 4.6 with its adaptive thinking offers excellent value — it automatically adjusts its reasoning effort based on difficulty.