Tree of Thought: When Chain-of-Thought Isn't Enough
By Dorian Laurenceau
๐ Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.
Chain-of-Thought follows a single reasoning path. But some problems require exploring multiple possibilities, backtracking, and comparing alternatives. That's where Tree of Thought comes in.
<!-- manual-insight -->
Tree of Thought in 2026: why it's mostly obsolete for frontier models
Tree of Thought (ToT) was introduced in Yao et al. 2023 ("Tree of Thoughts: Deliberate Problem Solving with Large Language Models"), and it drew attention as a structured way to do search over reasoning chains. The 2026 reality on r/MachineLearning, r/LocalLLaMA, and r/LangChain is that ToT has been largely absorbed by two developments that make the explicit framework less useful.
What displaced ToT:
- โReasoning models with built-in search. OpenAI o1 and o3, GPT-5 Thinking, Claude's extended thinking mode, and Gemini 2.5 Pro reasoning all do internal search and backtracking. You don't orchestrate it โ the model does it for you, more efficiently.
- โLonger context and better planning. Models can hold multiple approaches in a single chain and self-correct. The branching ToT paper proposed is mostly subsumed.
Where ToT still has legitimate use:
- โSmall models without built-in reasoning. If you're running Llama 3 or a fine-tuned open model without strong intrinsic reasoning, ToT-style prompting can substantially lift performance on complex planning tasks.
- โProblems where you need to show multiple candidates. Creative writing alternatives, design options, architectural choices. Explicit branching is valuable for user-facing decision support.
- โWhen you need auditable reasoning. ToT produces an explicit tree you can inspect. A reasoning model's internal thinking is often not.
What actually works in the ToT framework:
- โExplicit state evaluation. ToT papers propose rating each branch. In practice, the evaluator is often the same LLM, which can confidently prefer wrong branches. Using an external verifier (code execution, unit tests, or a different model) when possible is critical.
- โPruning aggressively. Naive ToT explodes computationally. Beam search (keep top-k at each step) is the practical version.
- โDepth-limited search. Most real problems don't need deep trees. 2-3 levels captures most of the benefit.
What practitioners report:
- โToT is expensive. For every branch you evaluate, you pay another LLM call. Naive implementations can 10-50x your costs.
- โLLM-as-evaluator is unreliable for ambiguous quality. The Huang et al. 2023 "Large Language Models Cannot Self-Correct Reasoning Yet" paper documented that models often prefer wrong-but-plausible branches.
- โModern reasoning models do this internally for free. The built-in thinking in o-series, GPT-5-thinking, and Claude extended thinking makes explicit ToT mostly redundant at the frontier.
The honest framing: Tree of Thought is a historically important idea that's been partly absorbed into model architecture. Reach for it when you need explicit auditable branching or you're working with a model that doesn't have native reasoning. Otherwise, use a reasoning model with structured prompting and let the model do the search internally.
Learn AI โ From Prompts to Agents
What Is Tree of Thought?
Tree of Thought (ToT) is a prompting technique where the AI:
- โGenerates multiple possible next steps
- โEvaluates which paths are promising
- โExplores the best options further
- โBacktracks if a path fails
It mimics how humans solve complex puzzles-considering alternatives, not just following one line of thinking.
Chain-of-Thought vs Tree of Thought
Chain-of-Thought (Linear)
Chain of Thought (Linear)
Start โ Step 1 โ Step 2 โ Step 3 โ Answer
One path, no alternatives considered. If Step 2 is wrong, everything after fails.
Tree of Thought (Branching)
Start branches into:
- โOption A
- โA1 โ (promising)
- โA2 โ (dead end)
- โOption B
- โB1 โ B1a โ Solution! โ
- โB2 โ (dead end)
- โOption C โ (pruned early)
Multiple paths explored. Dead ends abandoned. Best path found.
When Tree of Thought Helps
Puzzles and Games
Problem: "24 Game" - make 24 from [4, 5, 6, 3] using +, -, ร, รท
CoT approach: Try one combination, hope it works
ToT approach: Systematically explore combinations, evaluate each
Planning Problems
Problem: "Plan a 7-day Europe trip hitting 5 cities efficiently"
CoT: Generate one itinerary
ToT: Generate multiple routes, compare travel times, optimize
Creative Problem Solving
Problem: "Design a mobile app for elderly users"
CoT: One design idea
ToT: Multiple concepts, evaluate usability of each, combine best elements
Search Problems
Problem: Find the best marketing strategy from 20 options
CoT: Analyze sequentially, pick first "good enough"
ToT: Evaluate multiple strategies, compare, pick optimal
The ToT Process
Step 1: Decompose
Break the problem into steps:
Problem: "Write a creative story with a twist ending"
Decomposition:
1. Choose a genre/setting
2. Establish characters
3. Build rising tension
4. Create the twist
5. Resolve the story
Step 2: Generate Options
At each step, brainstorm multiple possibilities:
Step 1 - Genre options:
A) Mystery in a small town
B) Sci-fi on a space station
C) Romance in 1920s Paris
Step 3: Evaluate
Assess each option's promise:
A) Mystery: โ
โ
โ
โโ (common, but flexible for twists)
B) Sci-fi: โ
โ
โ
โ
โ (great twist potential, visual)
C) Romance: โ
โ
โโโ (harder to do unexpected twist)
Step 4: Explore Best Paths
Continue with promising options:
โ Pursue B) Sci-fi
Step 2 - Character options:
B1) Solo astronaut
B2) Ship crew
B3) AI companion
Best: B3 (AI companion opens twist possibilities)
Step 5: Backtrack If Needed
If a path hits a dead end:
B3 โ twist idea 1: predictable โ
B3 โ twist idea 2: doesn't fit โ
Backtrack to Step 1, try A) Mystery instead
Why ToT Works Better for Complex Problems
1. Avoids Early Commitment
CoT locks in decisions:
"The detective is named John..."
โ Stuck with this choice even if it creates problems later
ToT keeps options open:
Consider: John (detective), Sarah (journalist), Alex (suspect)
โ Choose based on what works best for the story
2. Enables Comparison
Strategy A produces: $50K revenue estimate
Strategy B produces: $75K revenue estimate
Strategy C produces: $60K revenue estimate
โ Choose B (can only compare with multiple paths)
3. Allows Recovery from Mistakes
Path going wrong? Backtrack.
CoT: Stuck with bad decisions
ToT: Return to last good state, try different branch
Real-World Example: Game of 24
Problem: Use 4, 9, 10, 13 to make 24 (each number once, any operations)
CoT Attempt
Let me try: 4 ร 9 = 36... 36 - 10 = 26...
Can't use 13 to get to 24. Failed.
Try again: 10 + 13 = 23... 23 + 4 = 27...
Can't use 9 to get to 24. Failed.
Random attempts, might not find solution.
ToT Approach
Generate possible first operations:
- 4 + 9 = 13 (duplicate with existing 13, interesting)
- 4 ร 9 = 36 (close to 24)
- 10 - 4 = 6 (small number, useful for multiplication)
- 13 - 9 = 4 (duplicate with existing 4)
Evaluate most promising: 10 - 4 = 6
With 6, 9, 13:
- 6 ร 9 = 54... minus 13 = 41 โ
- 13 - 9 = 4, 4 ร 6 = 24 โ
Solution: (13 - 9) ร (10 - 4) = 24
Systematic exploration finds the answer.
ToT Performance (Research)
Yao et al. (2023) compared techniques on puzzle-solving:
| Technique | Game of 24 | Creative Writing | Planning |
|---|---|---|---|
| Standard prompting | 7% | 6/10 | 35% |
| Chain-of-Thought | 4% | 6.5/10 | 42% |
| Tree of Thought | 74% | 7.5/10 | 71% |
For search-like problems, ToT dramatically outperforms.
When NOT to Use ToT
Simple Questions
"What's the capital of Japan?"
โ Just answer directly. No tree needed.
Linear Problems
"Summarize this document"
โ CoT is sufficient. No branching helps.
When Speed Matters
ToT requires multiple evaluations and comparisons.
For real-time chat, it's too slow.
Quick Summary
- โTree of Thought explores multiple reasoning paths
- โUses generate โ evaluate โ explore โ backtrack cycle
- โBest for puzzles, planning, and search problems
- โDramatically outperforms CoT on complex tasks (74% vs 4% on Game of 24)
- โTrade-off: More powerful but slower and costlier
Ready to Master Advanced Reasoning?
This article covered the what and why of Tree of Thought. But implementing these techniques effectively requires deep understanding and practice.
In our Module 3, Advanced Reasoning Techniques, you'll learn:
- โChain-of-Thought fundamentals
- โSelf-Consistency for reliability
- โTree of Thought implementation patterns
- โWhen to use each technique
- โPractical exercises with complex problems
Module 3 โ Chain-of-Thought & Reasoning
Master advanced reasoning techniques and Self-Consistency methods.
Dorian Laurenceau
Full-Stack Developer & Learning DesignerFull-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.
Weekly AI Insights
Tools, techniques & news โ curated for AI practitioners. Free, no spam.
Free, no spam. Unsubscribe anytime.
โRelated Articles
FAQ
What is Tree of Thought prompting?+
Tree of Thought (ToT) is an advanced prompting technique where AI explores multiple reasoning paths simultaneously, evaluates each branch, and selects the most promising solution.
How is Tree of Thought different from Chain-of-Thought?+
Chain-of-Thought follows one linear path. Tree of Thought branches into multiple paths, explores each, and can backtrack. It's better for problems with many possible approaches.
When should I use Tree of Thought?+
Use ToT for puzzles, planning problems, game strategy, and complex decisions where exploring alternatives matters. For simple reasoning, Chain-of-Thought is sufficient.
Does Tree of Thought require special AI models?+
No special models required, but ToT works best with capable models like GPT-4 or Claude. You implement it through prompting structure, not model features.