concept

AI Agent Architectures

created 2026-04-27 updated 2026-05-28 ai · langgraph · agents · architecture · react · multi-agent · skills · mcp

AI Agent Architectures

A comprehensive guide to agent architecture patterns in LangGraph, from simple ReAct loops to hierarchical multi-agent systems with pluggable skills. Written for teams already running LangGraph in production (e.g., fajb-next).

1. ReAct Agents

What They Are

ReAct (Reasoning + Acting) is the foundational agent pattern. The agent alternates between thinking (reasoning about what to do), acting (calling a tool), and observing (reading the tool’s result), then loops until the task is complete.

User Input → [Think → Act → Observe] → ... → Final Answer

In LangGraph, create_react_agent() (now create_agent() in the latest API) builds this loop automatically:

from langchain.agents import create_agent

agent = create_agent(
    model="anthropic:claude-sonnet-4-6",
    tools=[search, database_query, format_article],
    system_prompt="You are a journalist research assistant."
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "Find recent funding rounds in Nordic fintech"}]
})

Internally, the graph has two nodes:

  1. LLM node — receives messages, decides whether to call tools or respond
  2. Tool node — executes tool calls, feeds results back to LLM

A conditional edge checks: did the LLM return tool calls? If yes, route to tool node. If no, end.

Strengths

  • Simple to build and debug — one agent, one loop, deterministic structure
  • Good for 1-5 tools — the LLM can reason about a small toolset effectively
  • Sufficient for most use cases — article editing, search, data lookups, formatting
  • Built-in in LangGraphcreate_agent() handles the graph, state, and routing
  • Easy to add checkpointing — PostgresSaver gives you resumable conversations

Limitations

  • Context bloat — every tool result stays in the message history; after 10+ tool calls, the context window fills with intermediate results the LLM doesn’t need
  • Tool confusion at scale — with 15+ tools, the LLM starts picking wrong tools or hallucinating tool names
  • No parallelism — single agent processes sequentially
  • No specialization — one system prompt must cover all domains
  • Fragile on multi-step plans — tends to lose track of complex multi-step reasoning; no explicit planning mechanism
  • No delegation — can’t hand off sub-tasks to specialized workers

When ReAct Breaks Down

SymptomRoot Cause
Agent picks wrong tool repeatedlyToo many tools (>12-15) with overlapping descriptions
Responses degrade mid-conversationContext window filling with tool output noise
Multi-step tasks failNo planning mechanism; agent loses track of steps
Different domains need different promptsSingle system prompt can’t specialize
Latency too highSequential tool calls when parallel would work

2. Beyond ReAct: The Multi-Agent Spectrum

LangGraph defines five formal multi-agent patterns. Each solves specific ReAct limitations:

2.1 Handoffs

Agents dynamically transfer control to each other based on state. A tool updates a state variable (e.g., current_step or active_agent), which triggers routing to a different agent or configuration.

Two implementations:

Single agent with middleware (simpler, recommended for most cases):

from langchain.agents import create_agent
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse

class SupportState(AgentState):
    current_step: str = "triage"

@tool
def escalate_to_specialist(runtime: ToolRuntime) -> Command:
    """Move to specialist handling after triage is complete."""
    return Command(update={
        "messages": [ToolMessage(content="Escalated", tool_call_id=runtime.tool_call_id)],
        "current_step": "specialist"
    })

@wrap_model_call
def apply_step_config(request: ModelRequest, handler):
    step = request.state.get("current_step", "triage")
    configs = {
        "triage": {"prompt": "Collect article details...", "tools": [escalate_to_specialist]},
        "specialist": {"prompt": "Edit and format the article...", "tools": [edit_tool, publish_tool]}
    }
    config = configs[step]
    request = request.override(system_prompt=config["prompt"], tools=config["tools"])
    return handler(request)

agent = create_agent(model, tools=[...], state_schema=SupportState, middleware=[apply_step_config])

Multiple agent subgraphs (for bespoke agent logic):

@tool
def transfer_to_editor(runtime: ToolRuntime) -> Command:
    """Transfer conversation to the editor agent."""
    last_ai = next(m for m in reversed(runtime.state["messages"]) if isinstance(m, AIMessage))
    return Command(
        goto="editor_agent",
        update={"active_agent": "editor_agent", "messages": [last_ai, ToolMessage(...)]},
        graph=Command.PARENT  # Navigate in parent graph
    )

Best for: Sequential workflows, multi-stage conversations, customer support flows. JB example: article triage -> research -> editing -> publishing pipeline.

2.2 Subagents (Agent-as-Tool)

A supervisor agent calls specialized subagents as tools. Each subagent runs in isolation with its own context window, tools, and system prompt. Results flow back to the supervisor.

# Define a specialized subagent
research_agent = create_agent(
    model="anthropic:claude-sonnet-4-6",
    tools=[web_search, archive_search, database_query],
    system_prompt="You are a research specialist. Find and summarize information."
)

# Wrap as tool for the supervisor
@tool("research", description="Research a topic thoroughly using multiple sources")
def call_research_agent(query: str):
    result = research_agent.invoke({"messages": [{"role": "user", "content": query}]})
    return result["messages"][-1].content

# Supervisor uses subagent tools
supervisor = create_agent(
    model="anthropic:claude-sonnet-4-6",
    tools=[call_research_agent, call_editor_agent, call_fact_checker],
    system_prompt="You coordinate journalism tasks. Delegate to specialists."
)

Key benefit: Solves context bloat. The research agent might make 20 tool calls internally, but only returns a summary to the supervisor. Token usage drops ~67% vs a flat agent doing everything.

Best for: Parallel domains, large-context tasks, team-developed features. JB example: supervisor delegates to research-agent, editor-agent, fact-check-agent independently.

2.3 Skills (Pluggable Capabilities)

A single agent loads specialized prompts and context on-demand. Skills are prompt-driven specializations — lighter than subagents, heavier than simple tools.

@tool
def load_skill(skill_name: str) -> str:
    """Load a specialized skill for the current task."""
    skill_content = read_skill_file(f"skills/{skill_name}/SKILL.md")
    return skill_content  # Returns instructions + context the agent follows

agent = create_agent(
    model="anthropic:claude-sonnet-4-6",
    tools=[load_skill, ...domain_tools],
    system_prompt="You have access to skills. Load relevant skills before working."
)

Progressive disclosure: Agent only reads full skill details when matched. Avoids loading all context upfront. Skills can also register new tools dynamically when activated.

Deep Agents implementation: Skills follow the Agent Skills specification. Each skill is a directory:

skills/
├── article-editor/
│   ├── SKILL.md          # Instructions, examples, guidelines
│   └── templates/        # Article templates
├── financial-analysis/
│   ├── SKILL.md
│   └── analyze.py        # Executable script
└── source-verification/
    └── SKILL.md

Best for: Single agent with many specializations, distributed team development, repeat requests. JB example: journalist selects “article editing” + “financial analysis” skills for a specific task.

2.4 Router

An initial routing step classifies input and directs to specialized agents. Supports parallel execution.

def route_input(state):
    # Classify and route to appropriate specialist(s)
    if "financial" in state["messages"][-1].content:
        return ["financial_agent"]
    elif "editorial" in state["messages"][-1].content:
        return ["editor_agent"]
    return ["general_agent"]

builder = StateGraph(State)
builder.add_conditional_edges(START, route_input, [...])

Best for: Multi-domain tasks with clear classification boundaries.

2.5 Custom Workflow

Bespoke LangGraph graphs mixing deterministic logic with agentic nodes. You can embed any of the above patterns as nodes in a larger workflow.

Best for: Complex orchestration, mixing patterns, domain-specific logic.

3. LangGraph Sub-Graphs: Implementation

Sub-graphs are the building block for composable agent architectures.

Pattern 1: Different State Schemas (Wrapper Call)

When parent and child have different state, wrap the subgraph in a function that transforms state:

def call_research_subgraph(state: ParentState):
    # Transform parent state -> subgraph input
    result = research_graph.invoke({"query": state["current_topic"], "sources": []})
    # Transform subgraph output -> parent state update
    return {"research_results": result["summary"], "sources_found": result["sources"]}

builder.add_node("research", call_research_subgraph)

Pattern 2: Shared State (Direct Node)

When schemas overlap, add the compiled subgraph directly:

research_graph = research_builder.compile()
builder.add_node("research", research_graph)  # Shares state automatically

Dynamic Graph Composition at Runtime

You can assemble different sub-graphs based on which “skills” or features are enabled:

def build_agent_graph(enabled_skills: list[str]):
    builder = StateGraph(AgentState)
    builder.add_node("supervisor", supervisor_node)

    # Dynamically add skill sub-graphs
    skill_registry = {
        "research": research_subgraph,
        "editing": editing_subgraph,
        "fact_check": fact_check_subgraph,
        "financial": financial_subgraph,
    }

    available_nodes = ["supervisor"]
    for skill_name in enabled_skills:
        if skill_name in skill_registry:
            builder.add_node(skill_name, skill_registry[skill_name])
            available_nodes.append(skill_name)

    # Supervisor routes to enabled skills only
    def route(state):
        target = state.get("next_skill")
        return target if target in available_nodes else END

    builder.add_conditional_edges("supervisor", route, {n: n for n in available_nodes} | {END: END})
    for skill in enabled_skills:
        builder.add_edge(skill, "supervisor")

    builder.add_edge(START, "supervisor")
    return builder.compile()

# Runtime: journalist enables specific skills for their task
graph = build_agent_graph(["research", "financial"])

State Management Across Parent/Child

  • Per-invocation (default): Each subgraph call starts fresh. Inherits parent’s checkpointer for interrupt support within a single invocation.
  • Per-thread (checkpointer=True): Subgraph state persists across calls on the same thread. Useful for conversational sub-agents.
  • Stateless (checkpointer=False): No persistence. Runs like a plain function.

The Command Primitive

Command combines state updates with navigation — the glue for multi-agent routing:

from langgraph.types import Command

def my_node(state: State) -> Command[Literal["agent_a", "agent_b"]]:
    if state["needs_research"]:
        return Command(update={"status": "researching"}, goto="agent_a")
    return Command(update={"status": "editing"}, goto="agent_b")

For subgraph-to-parent navigation:

return Command(goto="target_node", graph=Command.PARENT)

4. MCP as a Tool Provider

Model Context Protocol (MCP) provides a standardized way to expose tools, resources, and prompts to AI agents.

Architecture

LangGraph Agent (Host)
  ├── MCP Client 1 → Local MCP Server (DB tools, stdio)
  ├── MCP Client 2 → Local MCP Server (file tools, stdio)
  └── MCP Client 3 → Remote MCP Server (API tools, HTTP)

LangGraph Integration

from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain.agents import create_agent

client = MultiServerMCPClient({
    "archive": {
        "transport": "stdio",
        "command": "python",
        "args": ["servers/archive_server.py"],
    },
    "cms": {
        "transport": "http",
        "url": "https://cms-api.internal/mcp",
        "headers": {"Authorization": "Bearer ..."},
    },
    "financial_data": {
        "transport": "http",
        "url": "https://finance-api.internal/mcp",
    }
})

tools = await client.get_tools()
agent = create_agent("anthropic:claude-sonnet-4-6", tools)

Dynamic Tool Addition/Removal

MCP servers can notify clients when tools change via notifications/tools/list_changed. The client re-fetches the tool list and the agent’s capabilities update. This enables:

  • Feature flags: Enable/disable tools per user or per plan
  • Rolling deployments: Add new tool servers without restarting the agent
  • Contextual tools: Serve different tools based on user context

Tool Interceptors

Middleware for MCP tool execution — inject user context, modify args, handle errors:

async def inject_journalist_context(request, handler):
    user_id = request.runtime.context.user_id
    request = request.override(args={**request.args, "journalist_id": user_id})
    return await handler(request)

client = MultiServerMCPClient({...}, tool_interceptors=[inject_journalist_context])

MCP vs Direct Tools

AspectDirect LangGraph ToolsMCP Tools
DefinitionIn-process Python/TS functionsSeparate server process
DiscoveryStatic at graph build timeDynamic via tools/list
DeploymentCoupled to agentIndependent lifecycle
SharingPer-agentAny MCP-compatible client
OverheadNoneJSON-RPC serialization

Use MCP when: tools are shared across multiple agents, need independent deployment, or come from third parties. Use direct tools when: performance matters, tools are agent-specific, or simplicity is preferred.

5. Skill Bundles Architecture for JournalistBoost

Applying these patterns to fajb-next, a practical “skill bundles” architecture:

Taxonomy

Simple Skill = Tool call (API wrapper)
  Examples: search_archive, get_person_profile, fetch_rss

Complex Skill = Sub-graph with own state and logic
  Examples: article_editor (multi-step editing pipeline),
            financial_analyzer (data fetch → compute → visualize),
            source_verifier (cross-reference → fact-check → confidence score)

Bundle = Named collection of skills, composed at runtime
  Examples: "Article Research" = [search_archive, web_search, source_verifier]
            "Financial Profile" = [get_person_profile, financial_analyzer, format_article]
            "Quick Edit" = [article_editor]

Implementation Pattern

# Skill registry
SIMPLE_SKILLS = {
    "search_archive": search_archive_tool,
    "get_person": get_person_tool,
    "web_search": web_search_tool,
    "fetch_rss": fetch_rss_tool,
}

COMPLEX_SKILLS = {
    "article_editor": article_editor_subgraph,
    "financial_analyzer": financial_analyzer_subgraph,
    "source_verifier": source_verifier_subgraph,
}

BUNDLES = {
    "article_research": ["search_archive", "web_search", "source_verifier"],
    "financial_profile": ["get_person", "financial_analyzer"],
    "quick_edit": ["article_editor"],
    "full_workflow": ["search_archive", "web_search", "source_verifier",
                      "article_editor", "financial_analyzer"],
}

def build_journalist_agent(bundle_name: str, journalist_id: str):
    bundle = BUNDLES[bundle_name]

    # Collect simple skills as tools
    tools = [SIMPLE_SKILLS[s] for s in bundle if s in SIMPLE_SKILLS]

    # Wrap complex skills as tools (subagent pattern)
    for skill_name in bundle:
        if skill_name in COMPLEX_SKILLS:
            subgraph = COMPLEX_SKILLS[skill_name]

            @tool(skill_name, description=f"Run the {skill_name} workflow")
            def run_skill(query: str, _sg=subgraph):
                result = _sg.invoke({"messages": [{"role": "user", "content": query}]})
                return result["messages"][-1].content

            tools.append(run_skill)

    return create_agent(
        model="anthropic:claude-sonnet-4-6",
        tools=tools,
        system_prompt=f"You are a journalist assistant with these capabilities: {', '.join(bundle)}",
        checkpointer=PostgresSaver(...)
    )

MCP-Based Alternative

Package each skill as an MCP server for maximum decoupling:

# Each skill team maintains their own MCP server
client = MultiServerMCPClient({
    "archive": {"transport": "stdio", "command": "python", "args": ["skills/archive/server.py"]},
    "editor": {"transport": "http", "url": "http://editor-skill:8000/mcp"},
    "financial": {"transport": "http", "url": "http://financial-skill:8000/mcp"},
})

# Bundle = which MCP servers to connect
async def build_agent_from_bundle(bundle_config: dict):
    filtered_client = MultiServerMCPClient({
        name: config for name, config in ALL_SERVERS.items()
        if name in bundle_config["skills"]
    })
    tools = await filtered_client.get_tools()
    return create_agent("anthropic:claude-sonnet-4-6", tools)

6. Decision Framework: When to Use What

Start Here: The Complexity Ladder

Level 0: Single LLM call (no tools)
  └── Sufficient for: summarization, classification, formatting

Level 1: ReAct agent with tools
  └── Sufficient for: search + answer, CRUD operations, simple workflows
  └── JB today: article editing, archive search, profile lookups

Level 2: ReAct + middleware (handoffs/skills)
  └── Need when: multi-stage workflows, >12 tools, domain switching
  └── JB next: article pipeline (triage → research → edit → publish)

Level 3: Subagents (agent-as-tool)
  └── Need when: context bloat, parallel domains, team-developed features
  └── JB future: supervisor → research-agent + editor-agent + fact-checker

Level 4: Full orchestration (custom workflow + MCP)
  └── Need when: complex multi-domain, many integrations, enterprise scale
  └── JB vision: investor chat connecting DBs, APIs, web search, CMS

Quick Decision Matrix

QuestionYes →No →
<12 tools total?ReAct is fineConsider skills or subagents
All tools in same domain?ReAct is fineSubagents per domain
Sequential workflow?HandoffsSubagents (parallel)
Tools need independent deployment?MCP serversDirect tools
Context window filling up?Subagents (isolation)ReAct is fine
Multiple teams building features?Subagents or SkillsSingle agent
Need dynamic tool composition?MCP or runtime graph buildStatic graph
User-facing conversation?HandoffsSubagents (supervisor)

7. Investor Chat: Why Deep Agents From Day One

An “investor chat” connecting multiple DBs, web search, and third-party APIs is inherently a multi-domain problem. Starting with a flat ReAct agent and bolting on complexity later creates technical debt:

Why subagents from the start:

  1. Context isolation — Financial DB queries return large result sets. Web search returns pages of text. A flat ReAct agent accumulates all of this in one context window. With subagents, each specialist processes its domain and returns a summary. Token savings: ~67%.

  2. Domain-specific prompting — A financial data agent needs different instructions than a news search agent. Subagents let each have optimal system prompts without a bloated unified prompt.

  3. Parallel execution — “Compare company X financials with recent news coverage” naturally decomposes into parallel sub-tasks. Subagents can run concurrently.

  4. Independent scaling — Financial data tools might need rate limiting. Web search might need caching. Subagents (or MCP servers) can scale independently.

  5. Incremental development — Start with 2-3 subagents (financial, news, general). Add more without touching existing ones. Each subagent is a self-contained unit.

Recommended architecture for investor chat:

Supervisor Agent (orchestrator)
  ├── financial-agent (DB queries, calculations, charts)
  ├── news-agent (web search, RSS, archive search)
  ├── company-agent (profile lookups, regulatory filings)
  └── general-agent (conversation, clarifications, formatting)

Each can be implemented as a subgraph first, then extracted to MCP servers if independent deployment is needed.