Back to all articles
13 MIN READ

The 5 AI Agent Architecture Patterns with Claude

By Learnia AI Research Team

The 5 AI Agent Architecture Patterns

The most successful agent implementations don't rely on complex frameworks — they use simple, composable patterns. This guide covers the 5 fundamental patterns for building AI systems ranging from simple prompt chains to fully autonomous agents.

Workflows vs Agents: The Fundamental Distinction

Before diving into the patterns, an essential distinction:

  • Workflows: LLMs and tools are orchestrated through predefined code paths. The developer controls the flow.
  • Agents: LLMs dynamically direct their own processes and tool usage. The AI decides what to do next.
Loading diagram…

The building block of every agentic system is the Augmented LLM: an LLM enhanced with retrieval, tools, and memory. To learn more about the retrieval component, see our RAG fundamentals guide.


Pattern 1: Prompt Chaining (Sequential Pipeline)

The task is decomposed into sequential steps, where each LLM call processes the output of the previous one. Programmatic gates can be added between steps to verify quality.

Loading diagram…

When to use: When the task decomposes naturally into fixed, sequential subtasks.

Real-world examples:

Use CaseStep 1GateStep 2
Marketing copyGenerate textCheck guidelinesTranslate
Document generationCreate outlineValidate criteriaWrite content
Code analysisGenerate codeRun testsRefactor
# Example: Writing chain with verification gate
def prompt_chain(input_text):
    # Step 1: Generate draft
    draft = call_claude("Write a summary of: " + input_text)
    
    # Gate: Check length and relevance
    if len(draft.split()) > 200:
        return "Draft too long, retry with constraint"
    
    # Step 2: Polish the style
    polished = call_claude("Improve the style: " + draft)
    
    return polished

For a complete guide on this pattern, see our dedicated article on prompt chaining and pipelines.


Pattern 2: Routing (Classification and Redirection)

Input is classified, then redirected to a specialized process. This enables separation of concerns: each branch has its own optimized prompt.

Loading diagram…

When to use: When there are distinct categories that are better handled separately.

Real-world examples:

ClassificationBranch ABranch BBranch C
Customer serviceRefund → dedicated workflowTechnical question → knowledge baseComplaint → human escalation
Query complexityEasy → Haiku (fast, cheap)Medium → Sonnet (balanced)Hard → Opus (deep reasoning)
Content typeCode → linter + reviewText → style analysisData → schema validation
# Example: Route queries by complexity to different models
def route_query(query):
    # Classify complexity
    category = call_claude(
        "Classify this query: SIMPLE, MEDIUM, COMPLEX\n" + query,
        model="haiku"
    )
    
    # Route to the right model
    model_map = {
        "SIMPLE": "haiku",
        "MEDIUM": "sonnet",
        "COMPLEX": "opus"
    }
    
    return call_claude(query, model=model_map.get(category, "sonnet"))

To dive deeper into this pattern, see our guide on conditional prompt routing.


Pattern 3: Parallelization

Subtasks are executed simultaneously, then results are aggregated. Two main variants:

  1. Sectioning: Independent subtasks in parallel
  2. Voting: Same task executed multiple times for consensus
Loading diagram…

When to use: When subtasks can be parallelized OR when multiple perspectives are needed.

Real-world examples:

VariantUse CaseDetail
SectioningGuardrailsOne model processes the query, another screens for inappropriate content
SectioningCode reviewSecurity + performance + style analysis in parallel
VotingSensitive classification3 runs → majority vote to reduce errors
VotingTranslationMultiple translations → select the best
import asyncio

# Example: Parallelized code review (sectioning)
async def parallel_code_review(code):
    security, performance, style = await asyncio.gather(
        call_claude_async("Analyze the security of this code:\n" + code),
        call_claude_async("Analyze the performance of this code:\n" + code),
        call_claude_async("Analyze the style of this code:\n" + code),
    )
    
    # Aggregate results
    return call_claude(
        f"Synthesize these 3 analyses into a unified report:\n"
        f"Security: {security}\nPerformance: {performance}\nStyle: {style}"
    )

Our article on map-reduce patterns explores decomposition and parallel aggregation strategies in detail.


Pattern 4: Orchestrator-Workers

A central LLM (the orchestrator) dynamically breaks down the task, delegates to workers, then synthesizes results. The key difference from parallelization: subtasks are not predefined — the orchestrator determines them on the fly.

Loading diagram…

When to use: When subtasks can't be predicted in advance (e.g., code changes across multiple files).

Real-world examples:

Use CaseOrchestrator DecidesWorkers Execute
Codebase refactoringWhich files to modifyEach worker modifies one file
Multi-source researchWhich sources to queryEach worker searches one source
Documentation generationWhich modules to documentEach worker writes one section
# Example: Orchestrator-workers for refactoring
def orchestrator_workers(task):
    # Orchestrator analyzes and plans
    plan = call_claude(
        "Analyze this task and break it into subtasks.\n"
        "Return a JSON with the subtasks.\n" + task,
        model="sonnet"
    )
    
    subtasks = json.loads(plan)
    
    # Workers execute in parallel
    results = []
    for subtask in subtasks:
        result = call_claude(subtask["prompt"], model="haiku")
        results.append(result)
    
    # Orchestrator synthesizes
    return call_claude(
        "Synthesize these results into a coherent response:\n"
        + "\n".join(results),
        model="sonnet"
    )

Pattern 5: Evaluator-Optimizer

One LLM generates a response, a second evaluates it and provides feedback, then the first refines. This cycle continues until quality is satisfactory.

Loading diagram…

When to use: When there are clear evaluation criteria AND iterative refinement provides measurable value.

Real-world examples:

Use CaseGeneratorEvaluatorCriteria
Literary translationTranslates the textChecks fidelity + styleScore ≥ 8/10
Code generationWrites the codeRuns testsAll tests pass
SEO writingWrites the articleChecks keywords + structureSEO checklist complete
# Example: Evaluator-optimizer loop
def evaluator_optimizer(task, max_iterations=3):
    response = call_claude("Generate: " + task)
    
    for i in range(max_iterations):
        # Evaluation
        evaluation = call_claude(
            f"Evaluate this response out of 10.\n"
            f"Task: {task}\nResponse: {response}\n"
            f"If score < 8, provide precise feedback for improvement."
        )
        
        if "score: 8" in evaluation or "score: 9" in evaluation or "score: 10" in evaluation:
            return response  # Quality sufficient
        
        # Refinement with feedback
        response = call_claude(
            f"Improve this response based on the feedback.\n"
            f"Previous response: {response}\n"
            f"Feedback: {evaluation}"
        )
    
    return response

To systematically evaluate your prompt outputs, see our Claude evaluations guide.


Autonomous Agents: When the LLM Takes the Wheel

Beyond structured workflows, autonomous agents let the LLM dynamically decide each next action. It's essentially an LLM using tools in a loop, guided by environmental feedback.

Loading diagram…

Key principles for autonomous agents:

  1. Ground truth at each step — The agent must verify the actual result of its actions (e.g., run tests, don't guess if they pass)
  2. Stopping conditions — Define a maximum number of iterations to avoid infinite loops
  3. Human intervention — Provide a mechanism for the agent to ask for help when stuck
  4. Sandboxing — Execute in a controlled environment with limited permissions

When to use an autonomous agent:

  • Open-ended problems where the number of steps is unpredictable
  • Reliable environments with clear feedback (e.g., automated tests)
  • Tasks where the human accepts delegating control

When NOT to use an autonomous agent:

  • Tasks with a predictable execution path → use a workflow
  • High-risk environments without rollback capability
  • When latency or cost are critical

To dive deeper into the autonomous agent pattern, read our guide on the ReAct method which details the Think→Act→Observe loop.


Designing the Agent-Computer Interface (ACI)

Just as we invest in human-computer interfaces (HCI/UX), we must invest in the Agent-Computer Interface (ACI) — how the agent interacts with its tools.

ACI Design Principles

PrincipleExplanationExample
Rich documentationInclude usage examples, edge cases, limits"search(query, max_results=10) — Searches the database. Returns empty array if no results. Max 100 results."
Poka-yokeMake mistakes impossible or difficultReject invalid parameters instead of silently ignoring them
Natural formatUse formats the model has seen during trainingJSON, Markdown rather than proprietary formats
Thinking spaceGive model enough tokens to "think"Add a reasoning field before the action field in the schema
# ❌ Poor ACI design: cryptic names, no examples
tools = [{
    "name": "q",
    "description": "Query",
    "input_schema": {"query": "string"}
}]

# ✅ Good ACI design: clear names, examples, edge cases
tools = [{
    "name": "search_knowledge_base",
    "description": "Search the knowledge base for relevant articles. "
                   "Returns top matches with title and excerpt. "
                   "Example: search_knowledge_base('how to configure MCP') "
                   "Returns empty array if no matches. Max 20 results.",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Natural language search query. Be specific."
            },
            "max_results": {
                "type": "integer",
                "default": 10,
                "description": "Maximum number of results (1-20)"
            }
        },
        "required": ["query"]
    }
}]

To learn more about designing tools for agents, see our Claude tool use guide and our article on custom Claude Code skills.


Decision Tree: Which Pattern to Choose?

Loading diagram…

Universal Best Practices

  1. Start simple — A well-written prompt often solves the problem without orchestration
  2. Add complexity incrementally — Each layer must add measurable value
  3. Prioritize transparency — Show the agent's planning steps (no black boxes)
  4. Invest in ACI — Spend as much time on tool design as on prompts
  5. Test extensively — Run hundreds of examples, iterate on tool definitions

Continue Your Learning

Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What are the 5 AI agent architecture patterns?+

The 5 patterns are: prompt chaining (sequential pipeline), routing (classification and redirection), parallelization (simultaneous execution), orchestrator-workers (dynamic delegation), and evaluator-optimizer (iterative refinement loop).

What's the difference between a workflow and an autonomous agent?+

A workflow orchestrates LLMs through predefined code paths — the developer controls the flow. An autonomous agent lets the LLM dynamically direct its own processes and tool usage, with minimal human intervention.

How do I choose the right agent pattern for my use case?+

Start with the simplest that works. Use prompt chaining for sequential tasks, routing for distinct categories, parallelization for independent subtasks, orchestrator-workers when subtasks are unpredictable, and evaluator-optimizer when iterative refinement provides measurable value.

Do I need a framework to build AI agents?+

Not necessarily. The most successful implementations use simple, composable patterns rather than complex frameworks. Frameworks can add unnecessary abstraction — prefer direct patterns with code you understand.