The 5 AI Agent Architecture Patterns with Claude
By Learnia AI Research Team
The 5 AI Agent Architecture Patterns
The most successful agent implementations don't rely on complex frameworks — they use simple, composable patterns. This guide covers the 5 fundamental patterns for building AI systems ranging from simple prompt chains to fully autonomous agents.
Workflows vs Agents: The Fundamental Distinction
Before diving into the patterns, an essential distinction:
- →Workflows: LLMs and tools are orchestrated through predefined code paths. The developer controls the flow.
- →Agents: LLMs dynamically direct their own processes and tool usage. The AI decides what to do next.
The building block of every agentic system is the Augmented LLM: an LLM enhanced with retrieval, tools, and memory. To learn more about the retrieval component, see our RAG fundamentals guide.
Pattern 1: Prompt Chaining (Sequential Pipeline)
The task is decomposed into sequential steps, where each LLM call processes the output of the previous one. Programmatic gates can be added between steps to verify quality.
When to use: When the task decomposes naturally into fixed, sequential subtasks.
Real-world examples:
| Use Case | Step 1 | Gate | Step 2 |
|---|---|---|---|
| Marketing copy | Generate text | Check guidelines | Translate |
| Document generation | Create outline | Validate criteria | Write content |
| Code analysis | Generate code | Run tests | Refactor |
# Example: Writing chain with verification gate
def prompt_chain(input_text):
# Step 1: Generate draft
draft = call_claude("Write a summary of: " + input_text)
# Gate: Check length and relevance
if len(draft.split()) > 200:
return "Draft too long, retry with constraint"
# Step 2: Polish the style
polished = call_claude("Improve the style: " + draft)
return polished
For a complete guide on this pattern, see our dedicated article on prompt chaining and pipelines.
Pattern 2: Routing (Classification and Redirection)
Input is classified, then redirected to a specialized process. This enables separation of concerns: each branch has its own optimized prompt.
When to use: When there are distinct categories that are better handled separately.
Real-world examples:
| Classification | Branch A | Branch B | Branch C |
|---|---|---|---|
| Customer service | Refund → dedicated workflow | Technical question → knowledge base | Complaint → human escalation |
| Query complexity | Easy → Haiku (fast, cheap) | Medium → Sonnet (balanced) | Hard → Opus (deep reasoning) |
| Content type | Code → linter + review | Text → style analysis | Data → schema validation |
# Example: Route queries by complexity to different models
def route_query(query):
# Classify complexity
category = call_claude(
"Classify this query: SIMPLE, MEDIUM, COMPLEX\n" + query,
model="haiku"
)
# Route to the right model
model_map = {
"SIMPLE": "haiku",
"MEDIUM": "sonnet",
"COMPLEX": "opus"
}
return call_claude(query, model=model_map.get(category, "sonnet"))
To dive deeper into this pattern, see our guide on conditional prompt routing.
Pattern 3: Parallelization
Subtasks are executed simultaneously, then results are aggregated. Two main variants:
- →Sectioning: Independent subtasks in parallel
- →Voting: Same task executed multiple times for consensus
When to use: When subtasks can be parallelized OR when multiple perspectives are needed.
Real-world examples:
| Variant | Use Case | Detail |
|---|---|---|
| Sectioning | Guardrails | One model processes the query, another screens for inappropriate content |
| Sectioning | Code review | Security + performance + style analysis in parallel |
| Voting | Sensitive classification | 3 runs → majority vote to reduce errors |
| Voting | Translation | Multiple translations → select the best |
import asyncio
# Example: Parallelized code review (sectioning)
async def parallel_code_review(code):
security, performance, style = await asyncio.gather(
call_claude_async("Analyze the security of this code:\n" + code),
call_claude_async("Analyze the performance of this code:\n" + code),
call_claude_async("Analyze the style of this code:\n" + code),
)
# Aggregate results
return call_claude(
f"Synthesize these 3 analyses into a unified report:\n"
f"Security: {security}\nPerformance: {performance}\nStyle: {style}"
)
Our article on map-reduce patterns explores decomposition and parallel aggregation strategies in detail.
Pattern 4: Orchestrator-Workers
A central LLM (the orchestrator) dynamically breaks down the task, delegates to workers, then synthesizes results. The key difference from parallelization: subtasks are not predefined — the orchestrator determines them on the fly.
When to use: When subtasks can't be predicted in advance (e.g., code changes across multiple files).
Real-world examples:
| Use Case | Orchestrator Decides | Workers Execute |
|---|---|---|
| Codebase refactoring | Which files to modify | Each worker modifies one file |
| Multi-source research | Which sources to query | Each worker searches one source |
| Documentation generation | Which modules to document | Each worker writes one section |
# Example: Orchestrator-workers for refactoring
def orchestrator_workers(task):
# Orchestrator analyzes and plans
plan = call_claude(
"Analyze this task and break it into subtasks.\n"
"Return a JSON with the subtasks.\n" + task,
model="sonnet"
)
subtasks = json.loads(plan)
# Workers execute in parallel
results = []
for subtask in subtasks:
result = call_claude(subtask["prompt"], model="haiku")
results.append(result)
# Orchestrator synthesizes
return call_claude(
"Synthesize these results into a coherent response:\n"
+ "\n".join(results),
model="sonnet"
)
Pattern 5: Evaluator-Optimizer
One LLM generates a response, a second evaluates it and provides feedback, then the first refines. This cycle continues until quality is satisfactory.
When to use: When there are clear evaluation criteria AND iterative refinement provides measurable value.
Real-world examples:
| Use Case | Generator | Evaluator | Criteria |
|---|---|---|---|
| Literary translation | Translates the text | Checks fidelity + style | Score ≥ 8/10 |
| Code generation | Writes the code | Runs tests | All tests pass |
| SEO writing | Writes the article | Checks keywords + structure | SEO checklist complete |
# Example: Evaluator-optimizer loop
def evaluator_optimizer(task, max_iterations=3):
response = call_claude("Generate: " + task)
for i in range(max_iterations):
# Evaluation
evaluation = call_claude(
f"Evaluate this response out of 10.\n"
f"Task: {task}\nResponse: {response}\n"
f"If score < 8, provide precise feedback for improvement."
)
if "score: 8" in evaluation or "score: 9" in evaluation or "score: 10" in evaluation:
return response # Quality sufficient
# Refinement with feedback
response = call_claude(
f"Improve this response based on the feedback.\n"
f"Previous response: {response}\n"
f"Feedback: {evaluation}"
)
return response
To systematically evaluate your prompt outputs, see our Claude evaluations guide.
Autonomous Agents: When the LLM Takes the Wheel
Beyond structured workflows, autonomous agents let the LLM dynamically decide each next action. It's essentially an LLM using tools in a loop, guided by environmental feedback.
Key principles for autonomous agents:
- →Ground truth at each step — The agent must verify the actual result of its actions (e.g., run tests, don't guess if they pass)
- →Stopping conditions — Define a maximum number of iterations to avoid infinite loops
- →Human intervention — Provide a mechanism for the agent to ask for help when stuck
- →Sandboxing — Execute in a controlled environment with limited permissions
When to use an autonomous agent:
- →Open-ended problems where the number of steps is unpredictable
- →Reliable environments with clear feedback (e.g., automated tests)
- →Tasks where the human accepts delegating control
When NOT to use an autonomous agent:
- →Tasks with a predictable execution path → use a workflow
- →High-risk environments without rollback capability
- →When latency or cost are critical
To dive deeper into the autonomous agent pattern, read our guide on the ReAct method which details the Think→Act→Observe loop.
Designing the Agent-Computer Interface (ACI)
Just as we invest in human-computer interfaces (HCI/UX), we must invest in the Agent-Computer Interface (ACI) — how the agent interacts with its tools.
ACI Design Principles
| Principle | Explanation | Example |
|---|---|---|
| Rich documentation | Include usage examples, edge cases, limits | "search(query, max_results=10) — Searches the database. Returns empty array if no results. Max 100 results." |
| Poka-yoke | Make mistakes impossible or difficult | Reject invalid parameters instead of silently ignoring them |
| Natural format | Use formats the model has seen during training | JSON, Markdown rather than proprietary formats |
| Thinking space | Give model enough tokens to "think" | Add a reasoning field before the action field in the schema |
# ❌ Poor ACI design: cryptic names, no examples
tools = [{
"name": "q",
"description": "Query",
"input_schema": {"query": "string"}
}]
# ✅ Good ACI design: clear names, examples, edge cases
tools = [{
"name": "search_knowledge_base",
"description": "Search the knowledge base for relevant articles. "
"Returns top matches with title and excerpt. "
"Example: search_knowledge_base('how to configure MCP') "
"Returns empty array if no matches. Max 20 results.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Natural language search query. Be specific."
},
"max_results": {
"type": "integer",
"default": 10,
"description": "Maximum number of results (1-20)"
}
},
"required": ["query"]
}
}]
To learn more about designing tools for agents, see our Claude tool use guide and our article on custom Claude Code skills.
Decision Tree: Which Pattern to Choose?
Universal Best Practices
- →Start simple — A well-written prompt often solves the problem without orchestration
- →Add complexity incrementally — Each layer must add measurable value
- →Prioritize transparency — Show the agent's planning steps (no black boxes)
- →Invest in ACI — Spend as much time on tool design as on prompts
- →Test extensively — Run hundreds of examples, iterate on tool definitions
Continue Your Learning
- →ReAct Method: How AI Agents Think and Act — The fundamental Think→Act→Observe loop
- →Prompt Chaining and Pipelines — Deep dive into Pattern 1
- →Conditional Prompt Routing — Deep dive into Pattern 2
- →Map-Reduce Patterns — Parallel decomposition techniques
- →Tool Use with Claude — Giving tools to your agents
- →Claude Evaluations — Measuring the quality of your agentic systems
Weekly AI Insights
Tools, techniques & news — curated for AI practitioners. Free, no spam.
Free, no spam. Unsubscribe anytime.
→Related Articles
FAQ
What are the 5 AI agent architecture patterns?+
The 5 patterns are: prompt chaining (sequential pipeline), routing (classification and redirection), parallelization (simultaneous execution), orchestrator-workers (dynamic delegation), and evaluator-optimizer (iterative refinement loop).
What's the difference between a workflow and an autonomous agent?+
A workflow orchestrates LLMs through predefined code paths — the developer controls the flow. An autonomous agent lets the LLM dynamically direct its own processes and tool usage, with minimal human intervention.
How do I choose the right agent pattern for my use case?+
Start with the simplest that works. Use prompt chaining for sequential tasks, routing for distinct categories, parallelization for independent subtasks, orchestrator-workers when subtasks are unpredictable, and evaluator-optimizer when iterative refinement provides measurable value.
Do I need a framework to build AI agents?+
Not necessarily. The most successful implementations use simple, composable patterns rather than complex frameworks. Frameworks can add unnecessary abstraction — prefer direct patterns with code you understand.