Back to all articles
15 MIN READ

The 5 AI Agent Architecture Patterns with Claude

By Dorian Laurenceau

📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.

The 5 AI Agent Architecture Patterns

The most successful agent implementations don't rely on complex frameworks, they use simple, composable patterns. This guide covers the 5 fundamental patterns for building AI systems ranging from simple prompt chains to fully autonomous agents.

Workflows vs Agents: The Fundamental Distinction

Before diving into the patterns, an essential distinction:

  • Workflows: LLMs and tools are orchestrated through predefined code paths. The developer controls the flow.
  • Agents: LLMs dynamically direct their own processes and tool usage. The AI decides what to do next.
Loading diagram…

The building block of every agentic system is the Augmented LLM: an LLM enhanced with retrieval, tools, and memory. To learn more about the retrieval component, see our RAG fundamentals guide.


Which agent patterns actually ship to production (and which don't)

The agent-pattern taxonomy comes from Anthropic's "Building Effective Agents" post and has become the shared vocabulary for the field. The threads on r/LocalLLaMA, r/MachineLearning, r/ExperiencedDevs, and r/LangChain track the real operational experience: most teams ship workflows, not autonomous agents, and the patterns that look simple on paper are the ones that reach production.

Patterns that ship:

  • Prompt chaining. The most common pattern in production. Deterministic, debuggable, fails predictably. LangChain, LlamaIndex, and custom Python pipelines all implement this well.
  • Routing with classifiers. Cheap model decides which expensive model or which specialised prompt handles the request. Works because the routing task is narrow.
  • Parallelisation for independent subtasks. Batch processing, multi-aspect summarisation, voting-based robustness. Simple to implement and test.
  • Evaluator-optimiser loops with hard termination. Generator proposes, evaluator scores, loop terminates at N iterations or threshold. Production-safe because the loop is bounded.

Patterns that quietly fail:

  • Autonomous agents with open-ended tool use in unreliable domains. The Voyager paper showed impressive demos; production engineering teams report that autonomous agents in messy real-world codebases, customer systems, or legal environments fail in ways that are expensive to debug.
  • Orchestrator-workers with many workers. Works with 2-3 workers; breaks down at 10+ because coordination overhead and context drift dominate.
  • Agents with memory that "learns" over time. Sounds great; practical issue is that the learned state is opaque, hard to version-control, and tends to drift in ways the team can't explain.

What the successful production teams consistently do:

  • Start with workflows, graduate to agents only when warranted. The Anthropic guidance is explicit: if a deterministic workflow works, don't reach for an agent.
  • Instrument everything. LangSmith, Langfuse, Arize, and Helicone are table stakes. If you can't see what the agent did, you can't improve it.
  • Write evals before you write the agent. Building agents without evals is the single most common pattern in failed projects.
  • Set hard budgets. Token budgets, iteration caps, timeout limits. Agents without budgets will find ways to spend infinite resources.
  • Use structured outputs. JSON schema validation, Pydantic, or Instructor make failures explicit.

Frameworks worth knowing (and their tradeoffs):

  • LangGraph is the most production-ready orchestration framework; also the most complex.
  • CrewAI is friendlier; also opinionated in ways that don't fit every team.
  • AutoGen is Microsoft's answer; strong on conversation-driven patterns.
  • Claude Code sub-agents and OpenAI Assistants API are the first-party options.
  • Raw Python + Anthropic/OpenAI SDK. Often the best choice. No framework is a clean fit for every problem, and most frameworks eventually become the bug.

The honest framing: the "agent patterns" that matter in production are mostly the simple ones. Sequential pipelines, routing, parallelisation, bounded evaluator loops. The exciting autonomous-agent patterns are research and demo territory, not production reliability. Ship the workflow that works, instrument it well, add agent capabilities only where the determinism of a workflow can't solve the problem.

Pattern 1: Prompt Chaining (Sequential Pipeline)

The task is decomposed into sequential steps, where each LLM call processes the output of the previous one. Programmatic gates can be added between steps to verify quality.

Loading diagram…

When to use: When the task decomposes naturally into fixed, sequential subtasks.

Real-world examples:

Use CaseStep 1GateStep 2
Marketing copyGenerate textCheck guidelinesTranslate
Document generationCreate outlineValidate criteriaWrite content
Code analysisGenerate codeRun testsRefactor
# Example: Writing chain with verification gate
def prompt_chain(input_text):
    # Step 1: Generate draft
    draft = call_claude("Write a summary of: " + input_text)
    
    # Gate: Check length and relevance
    if len(draft.split()) > 200:
        return "Draft too long, retry with constraint"
    
    # Step 2: Polish the style
    polished = call_claude("Improve the style: " + draft)
    
    return polished

For a complete guide on this pattern, see our dedicated article on prompt chaining and pipelines.


Pattern 2: Routing (Classification and Redirection)

Input is classified, then redirected to a specialized process. This enables separation of concerns: each branch has its own optimized prompt.

Loading diagram…

When to use: When there are distinct categories that are better handled separately.

Real-world examples:

ClassificationBranch ABranch BBranch C
Customer serviceRefund → dedicated workflowTechnical question → knowledge baseComplaint → human escalation
Query complexityEasy → Haiku (fast, cheap)Medium → Sonnet (balanced)Hard → Opus (deep reasoning)
Content typeCode → linter + reviewText → style analysisData → schema validation
# Example: Route queries by complexity to different models
def route_query(query):
    # Classify complexity
    category = call_claude(
        "Classify this query: SIMPLE, MEDIUM, COMPLEX\n" + query,
        model="haiku"
    )
    
    # Route to the right model
    model_map = {
        "SIMPLE": "haiku",
        "MEDIUM": "sonnet",
        "COMPLEX": "opus"
    }
    
    return call_claude(query, model=model_map.get(category, "sonnet"))

To dive deeper into this pattern, see our guide on conditional prompt routing.


Pattern 3: Parallelization

Subtasks are executed simultaneously, then results are aggregated. Two main variants:

  1. Sectioning: Independent subtasks in parallel
  2. Voting: Same task executed multiple times for consensus
Loading diagram…

When to use: When subtasks can be parallelized OR when multiple perspectives are needed.

Real-world examples:

VariantUse CaseDetail
SectioningGuardrailsOne model processes the query, another screens for inappropriate content
SectioningCode reviewSecurity + performance + style analysis in parallel
VotingSensitive classification3 runs → majority vote to reduce errors
VotingTranslationMultiple translations → select the best
import asyncio

# Example: Parallelized code review (sectioning)
async def parallel_code_review(code):
    security, performance, style = await asyncio.gather(
        call_claude_async("Analyze the security of this code:\n" + code),
        call_claude_async("Analyze the performance of this code:\n" + code),
        call_claude_async("Analyze the style of this code:\n" + code),
    )
    
    # Aggregate results
    return call_claude(
        f"Synthesize these 3 analyses into a unified report:\n"
        f"Security: {security}\nPerformance: {performance}\nStyle: {style}"
    )

Our article on map-reduce patterns explores decomposition and parallel aggregation strategies in detail.


Pattern 4: Orchestrator-Workers

A central LLM (the orchestrator) dynamically breaks down the task, delegates to workers, then synthesizes results. The key difference from parallelization: subtasks are not predefined, the orchestrator determines them on the fly.

Loading diagram…

When to use: When subtasks can't be predicted in advance (e.g., code changes across multiple files).

Real-world examples:

Use CaseOrchestrator DecidesWorkers Execute
Codebase refactoringWhich files to modifyEach worker modifies one file
Multi-source researchWhich sources to queryEach worker searches one source
Documentation generationWhich modules to documentEach worker writes one section
# Example: Orchestrator-workers for refactoring
def orchestrator_workers(task):
    # Orchestrator analyzes and plans
    plan = call_claude(
        "Analyze this task and break it into subtasks.\n"
        "Return a JSON with the subtasks.\n" + task,
        model="sonnet"
    )
    
    subtasks = json.loads(plan)
    
    # Workers execute in parallel
    results = []
    for subtask in subtasks:
        result = call_claude(subtask["prompt"], model="haiku")
        results.append(result)
    
    # Orchestrator synthesizes
    return call_claude(
        "Synthesize these results into a coherent response:\n"
        + "\n".join(results),
        model="sonnet"
    )

Pattern 5: Evaluator-Optimizer

One LLM generates a response, a second evaluates it and provides feedback, then the first refines. This cycle continues until quality is satisfactory.

Loading diagram…

When to use: When there are clear evaluation criteria AND iterative refinement provides measurable value.

Real-world examples:

Use CaseGeneratorEvaluatorCriteria
Literary translationTranslates the textChecks fidelity + styleScore ≥ 8/10
Code generationWrites the codeRuns testsAll tests pass
SEO writingWrites the articleChecks keywords + structureSEO checklist complete
# Example: Evaluator-optimizer loop
def evaluator_optimizer(task, max_iterations=3):
    response = call_claude("Generate: " + task)
    
    for i in range(max_iterations):
        # Evaluation
        evaluation = call_claude(
            f"Evaluate this response out of 10.\n"
            f"Task: {task}\nResponse: {response}\n"
            f"If score < 8, provide precise feedback for improvement."
        )
        
        if "score: 8" in evaluation or "score: 9" in evaluation or "score: 10" in evaluation:
            return response  # Quality sufficient
        
        # Refinement with feedback
        response = call_claude(
            f"Improve this response based on the feedback.\n"
            f"Previous response: {response}\n"
            f"Feedback: {evaluation}"
        )
    
    return response

To systematically evaluate your prompt outputs, see our Claude evaluations guide.


Autonomous Agents: When the LLM Takes the Wheel

Beyond structured workflows, autonomous agents let the LLM dynamically decide each next action. It's essentially an LLM using tools in a loop, guided by environmental feedback.

Loading diagram…

Key principles for autonomous agents:

  1. Ground truth at each step, The agent must verify the actual result of its actions (e.g., run tests, don't guess if they pass)
  2. Stopping conditions, Define a maximum number of iterations to avoid infinite loops
  3. Human intervention, Provide a mechanism for the agent to ask for help when stuck
  4. Sandboxing, Execute in a controlled environment with limited permissions

When to use an autonomous agent:

  • Open-ended problems where the number of steps is unpredictable
  • Reliable environments with clear feedback (e.g., automated tests)
  • Tasks where the human accepts delegating control

When NOT to use an autonomous agent:

  • Tasks with a predictable execution path → use a workflow
  • High-risk environments without rollback capability
  • When latency or cost are critical

To dive deeper into the autonomous agent pattern, read our guide on the ReAct method which details the Think→Act→Observe loop.


Designing the Agent-Computer Interface (ACI)

Just as we invest in human-computer interfaces (HCI/UX), we must invest in the Agent-Computer Interface (ACI), how the agent interacts with its tools.

ACI Design Principles

PrincipleExplanationExample
Rich documentationInclude usage examples, edge cases, limits"search(query, max_results=10) — Searches the database. Returns empty array if no results. Max 100 results."
Poka-yokeMake mistakes impossible or difficultReject invalid parameters instead of silently ignoring them
Natural formatUse formats the model has seen during trainingJSON, Markdown rather than proprietary formats
Thinking spaceGive model enough tokens to "think"Add a reasoning field before the action field in the schema
# ❌ Poor ACI design: cryptic names, no examples
tools = [{
    "name": "q",
    "description": "Query",
    "input_schema": {"query": "string"}
}]

# ✅ Good ACI design: clear names, examples, edge cases
tools = [{
    "name": "search_knowledge_base",
    "description": "Search the knowledge base for relevant articles. "
                   "Returns top matches with title and excerpt. "
                   "Example: search_knowledge_base('how to configure MCP') "
                   "Returns empty array if no matches. Max 20 results.",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Natural language search query. Be specific."
            },
            "max_results": {
                "type": "integer",
                "default": 10,
                "description": "Maximum number of results (1-20)"
            }
        },
        "required": ["query"]
    }
}]

To learn more about designing tools for agents, see our Claude tool use guide and our article on custom Claude Code skills.


Decision Tree: Which Pattern to Choose?

Loading diagram…

Universal Best Practices

  1. Start simple, A well-written prompt often solves the problem without orchestration
  2. Add complexity incrementally, Each layer must add measurable value
  3. Prioritize transparency, Show the agent's planning steps (no black boxes)
  4. Invest in ACI, Spend as much time on tool design as on prompts
  5. Test extensively, Run hundreds of examples, iterate on tool definitions

D

Dorian Laurenceau

Full-Stack Developer & Learning Designer

Full-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.

Prompt EngineeringLLMsFull-Stack DevelopmentLearning DesignReact
Published: March 10, 2026Updated: April 24, 2026
Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What are the 5 AI agent architecture patterns?+

The 5 patterns are: prompt chaining (sequential pipeline), routing (classification and redirection), parallelization (simultaneous execution), orchestrator-workers (dynamic delegation), and evaluator-optimizer (iterative refinement loop).

What's the difference between a workflow and an autonomous agent?+

A workflow orchestrates LLMs through predefined code paths, the developer controls the flow. An autonomous agent lets the LLM dynamically direct its own processes and tool usage, with minimal human intervention.

How do I choose the right agent pattern for my use case?+

Start with the simplest that works. Use prompt chaining for sequential tasks, routing for distinct categories, parallelization for independent subtasks, orchestrator-workers when subtasks are unpredictable, and evaluator-optimizer when iterative refinement provides measurable value.

Do I need a framework to build AI agents?+

Not necessarily. The most successful implementations use simple, composable patterns rather than complex frameworks. Frameworks can add unnecessary abstraction, prefer direct patterns with code you understand.