concept
AI Agent Architectures
AI Agent Architectures
A comprehensive guide to agent architecture patterns in LangGraph, from simple ReAct loops to hierarchical multi-agent systems with pluggable skills. Written for teams already running LangGraph in production (e.g., fajb-next).
1. ReAct Agents
What They Are
ReAct (Reasoning + Acting) is the foundational agent pattern. The agent alternates between thinking (reasoning about what to do), acting (calling a tool), and observing (reading the tool’s result), then loops until the task is complete.
User Input → [Think → Act → Observe] → ... → Final Answer
In LangGraph, create_react_agent() (now create_agent() in the latest API) builds this loop automatically:
from langchain.agents import create_agent
agent = create_agent(
model="anthropic:claude-sonnet-4-6",
tools=[search, database_query, format_article],
system_prompt="You are a journalist research assistant."
)
result = agent.invoke({
"messages": [{"role": "user", "content": "Find recent funding rounds in Nordic fintech"}]
})
Internally, the graph has two nodes:
- LLM node — receives messages, decides whether to call tools or respond
- Tool node — executes tool calls, feeds results back to LLM
A conditional edge checks: did the LLM return tool calls? If yes, route to tool node. If no, end.
Strengths
- Simple to build and debug — one agent, one loop, deterministic structure
- Good for 1-5 tools — the LLM can reason about a small toolset effectively
- Sufficient for most use cases — article editing, search, data lookups, formatting
- Built-in in LangGraph —
create_agent()handles the graph, state, and routing - Easy to add checkpointing — PostgresSaver gives you resumable conversations
Limitations
- Context bloat — every tool result stays in the message history; after 10+ tool calls, the context window fills with intermediate results the LLM doesn’t need
- Tool confusion at scale — with 15+ tools, the LLM starts picking wrong tools or hallucinating tool names
- No parallelism — single agent processes sequentially
- No specialization — one system prompt must cover all domains
- Fragile on multi-step plans — tends to lose track of complex multi-step reasoning; no explicit planning mechanism
- No delegation — can’t hand off sub-tasks to specialized workers
When ReAct Breaks Down
| Symptom | Root Cause |
|---|---|
| Agent picks wrong tool repeatedly | Too many tools (>12-15) with overlapping descriptions |
| Responses degrade mid-conversation | Context window filling with tool output noise |
| Multi-step tasks fail | No planning mechanism; agent loses track of steps |
| Different domains need different prompts | Single system prompt can’t specialize |
| Latency too high | Sequential tool calls when parallel would work |
2. Beyond ReAct: The Multi-Agent Spectrum
LangGraph defines five formal multi-agent patterns. Each solves specific ReAct limitations:
2.1 Handoffs
Agents dynamically transfer control to each other based on state. A tool updates a state variable (e.g., current_step or active_agent), which triggers routing to a different agent or configuration.
Two implementations:
Single agent with middleware (simpler, recommended for most cases):
from langchain.agents import create_agent
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse
class SupportState(AgentState):
current_step: str = "triage"
@tool
def escalate_to_specialist(runtime: ToolRuntime) -> Command:
"""Move to specialist handling after triage is complete."""
return Command(update={
"messages": [ToolMessage(content="Escalated", tool_call_id=runtime.tool_call_id)],
"current_step": "specialist"
})
@wrap_model_call
def apply_step_config(request: ModelRequest, handler):
step = request.state.get("current_step", "triage")
configs = {
"triage": {"prompt": "Collect article details...", "tools": [escalate_to_specialist]},
"specialist": {"prompt": "Edit and format the article...", "tools": [edit_tool, publish_tool]}
}
config = configs[step]
request = request.override(system_prompt=config["prompt"], tools=config["tools"])
return handler(request)
agent = create_agent(model, tools=[...], state_schema=SupportState, middleware=[apply_step_config])
Multiple agent subgraphs (for bespoke agent logic):
@tool
def transfer_to_editor(runtime: ToolRuntime) -> Command:
"""Transfer conversation to the editor agent."""
last_ai = next(m for m in reversed(runtime.state["messages"]) if isinstance(m, AIMessage))
return Command(
goto="editor_agent",
update={"active_agent": "editor_agent", "messages": [last_ai, ToolMessage(...)]},
graph=Command.PARENT # Navigate in parent graph
)
Best for: Sequential workflows, multi-stage conversations, customer support flows. JB example: article triage -> research -> editing -> publishing pipeline.
2.2 Subagents (Agent-as-Tool)
A supervisor agent calls specialized subagents as tools. Each subagent runs in isolation with its own context window, tools, and system prompt. Results flow back to the supervisor.
# Define a specialized subagent
research_agent = create_agent(
model="anthropic:claude-sonnet-4-6",
tools=[web_search, archive_search, database_query],
system_prompt="You are a research specialist. Find and summarize information."
)
# Wrap as tool for the supervisor
@tool("research", description="Research a topic thoroughly using multiple sources")
def call_research_agent(query: str):
result = research_agent.invoke({"messages": [{"role": "user", "content": query}]})
return result["messages"][-1].content
# Supervisor uses subagent tools
supervisor = create_agent(
model="anthropic:claude-sonnet-4-6",
tools=[call_research_agent, call_editor_agent, call_fact_checker],
system_prompt="You coordinate journalism tasks. Delegate to specialists."
)
Key benefit: Solves context bloat. The research agent might make 20 tool calls internally, but only returns a summary to the supervisor. Token usage drops ~67% vs a flat agent doing everything.
Best for: Parallel domains, large-context tasks, team-developed features. JB example: supervisor delegates to research-agent, editor-agent, fact-check-agent independently.
2.3 Skills (Pluggable Capabilities)
A single agent loads specialized prompts and context on-demand. Skills are prompt-driven specializations — lighter than subagents, heavier than simple tools.
@tool
def load_skill(skill_name: str) -> str:
"""Load a specialized skill for the current task."""
skill_content = read_skill_file(f"skills/{skill_name}/SKILL.md")
return skill_content # Returns instructions + context the agent follows
agent = create_agent(
model="anthropic:claude-sonnet-4-6",
tools=[load_skill, ...domain_tools],
system_prompt="You have access to skills. Load relevant skills before working."
)
Progressive disclosure: Agent only reads full skill details when matched. Avoids loading all context upfront. Skills can also register new tools dynamically when activated.
Deep Agents implementation: Skills follow the Agent Skills specification. Each skill is a directory:
skills/
├── article-editor/
│ ├── SKILL.md # Instructions, examples, guidelines
│ └── templates/ # Article templates
├── financial-analysis/
│ ├── SKILL.md
│ └── analyze.py # Executable script
└── source-verification/
└── SKILL.md
Best for: Single agent with many specializations, distributed team development, repeat requests. JB example: journalist selects “article editing” + “financial analysis” skills for a specific task.
2.4 Router
An initial routing step classifies input and directs to specialized agents. Supports parallel execution.
def route_input(state):
# Classify and route to appropriate specialist(s)
if "financial" in state["messages"][-1].content:
return ["financial_agent"]
elif "editorial" in state["messages"][-1].content:
return ["editor_agent"]
return ["general_agent"]
builder = StateGraph(State)
builder.add_conditional_edges(START, route_input, [...])
Best for: Multi-domain tasks with clear classification boundaries.
2.5 Custom Workflow
Bespoke LangGraph graphs mixing deterministic logic with agentic nodes. You can embed any of the above patterns as nodes in a larger workflow.
Best for: Complex orchestration, mixing patterns, domain-specific logic.
3. LangGraph Sub-Graphs: Implementation
Sub-graphs are the building block for composable agent architectures.
Pattern 1: Different State Schemas (Wrapper Call)
When parent and child have different state, wrap the subgraph in a function that transforms state:
def call_research_subgraph(state: ParentState):
# Transform parent state -> subgraph input
result = research_graph.invoke({"query": state["current_topic"], "sources": []})
# Transform subgraph output -> parent state update
return {"research_results": result["summary"], "sources_found": result["sources"]}
builder.add_node("research", call_research_subgraph)
Pattern 2: Shared State (Direct Node)
When schemas overlap, add the compiled subgraph directly:
research_graph = research_builder.compile()
builder.add_node("research", research_graph) # Shares state automatically
Dynamic Graph Composition at Runtime
You can assemble different sub-graphs based on which “skills” or features are enabled:
def build_agent_graph(enabled_skills: list[str]):
builder = StateGraph(AgentState)
builder.add_node("supervisor", supervisor_node)
# Dynamically add skill sub-graphs
skill_registry = {
"research": research_subgraph,
"editing": editing_subgraph,
"fact_check": fact_check_subgraph,
"financial": financial_subgraph,
}
available_nodes = ["supervisor"]
for skill_name in enabled_skills:
if skill_name in skill_registry:
builder.add_node(skill_name, skill_registry[skill_name])
available_nodes.append(skill_name)
# Supervisor routes to enabled skills only
def route(state):
target = state.get("next_skill")
return target if target in available_nodes else END
builder.add_conditional_edges("supervisor", route, {n: n for n in available_nodes} | {END: END})
for skill in enabled_skills:
builder.add_edge(skill, "supervisor")
builder.add_edge(START, "supervisor")
return builder.compile()
# Runtime: journalist enables specific skills for their task
graph = build_agent_graph(["research", "financial"])
State Management Across Parent/Child
- Per-invocation (default): Each subgraph call starts fresh. Inherits parent’s checkpointer for interrupt support within a single invocation.
- Per-thread (
checkpointer=True): Subgraph state persists across calls on the same thread. Useful for conversational sub-agents. - Stateless (
checkpointer=False): No persistence. Runs like a plain function.
The Command Primitive
Command combines state updates with navigation — the glue for multi-agent routing:
from langgraph.types import Command
def my_node(state: State) -> Command[Literal["agent_a", "agent_b"]]:
if state["needs_research"]:
return Command(update={"status": "researching"}, goto="agent_a")
return Command(update={"status": "editing"}, goto="agent_b")
For subgraph-to-parent navigation:
return Command(goto="target_node", graph=Command.PARENT)
4. MCP as a Tool Provider
Model Context Protocol (MCP) provides a standardized way to expose tools, resources, and prompts to AI agents.
Architecture
LangGraph Agent (Host)
├── MCP Client 1 → Local MCP Server (DB tools, stdio)
├── MCP Client 2 → Local MCP Server (file tools, stdio)
└── MCP Client 3 → Remote MCP Server (API tools, HTTP)
LangGraph Integration
from langchain_mcp_adapters.client import MultiServerMCPClient
from langchain.agents import create_agent
client = MultiServerMCPClient({
"archive": {
"transport": "stdio",
"command": "python",
"args": ["servers/archive_server.py"],
},
"cms": {
"transport": "http",
"url": "https://cms-api.internal/mcp",
"headers": {"Authorization": "Bearer ..."},
},
"financial_data": {
"transport": "http",
"url": "https://finance-api.internal/mcp",
}
})
tools = await client.get_tools()
agent = create_agent("anthropic:claude-sonnet-4-6", tools)
Dynamic Tool Addition/Removal
MCP servers can notify clients when tools change via notifications/tools/list_changed. The client re-fetches the tool list and the agent’s capabilities update. This enables:
- Feature flags: Enable/disable tools per user or per plan
- Rolling deployments: Add new tool servers without restarting the agent
- Contextual tools: Serve different tools based on user context
Tool Interceptors
Middleware for MCP tool execution — inject user context, modify args, handle errors:
async def inject_journalist_context(request, handler):
user_id = request.runtime.context.user_id
request = request.override(args={**request.args, "journalist_id": user_id})
return await handler(request)
client = MultiServerMCPClient({...}, tool_interceptors=[inject_journalist_context])
MCP vs Direct Tools
| Aspect | Direct LangGraph Tools | MCP Tools |
|---|---|---|
| Definition | In-process Python/TS functions | Separate server process |
| Discovery | Static at graph build time | Dynamic via tools/list |
| Deployment | Coupled to agent | Independent lifecycle |
| Sharing | Per-agent | Any MCP-compatible client |
| Overhead | None | JSON-RPC serialization |
Use MCP when: tools are shared across multiple agents, need independent deployment, or come from third parties. Use direct tools when: performance matters, tools are agent-specific, or simplicity is preferred.
5. Skill Bundles Architecture for JournalistBoost
Applying these patterns to fajb-next, a practical “skill bundles” architecture:
Taxonomy
Simple Skill = Tool call (API wrapper)
Examples: search_archive, get_person_profile, fetch_rss
Complex Skill = Sub-graph with own state and logic
Examples: article_editor (multi-step editing pipeline),
financial_analyzer (data fetch → compute → visualize),
source_verifier (cross-reference → fact-check → confidence score)
Bundle = Named collection of skills, composed at runtime
Examples: "Article Research" = [search_archive, web_search, source_verifier]
"Financial Profile" = [get_person_profile, financial_analyzer, format_article]
"Quick Edit" = [article_editor]
Implementation Pattern
# Skill registry
SIMPLE_SKILLS = {
"search_archive": search_archive_tool,
"get_person": get_person_tool,
"web_search": web_search_tool,
"fetch_rss": fetch_rss_tool,
}
COMPLEX_SKILLS = {
"article_editor": article_editor_subgraph,
"financial_analyzer": financial_analyzer_subgraph,
"source_verifier": source_verifier_subgraph,
}
BUNDLES = {
"article_research": ["search_archive", "web_search", "source_verifier"],
"financial_profile": ["get_person", "financial_analyzer"],
"quick_edit": ["article_editor"],
"full_workflow": ["search_archive", "web_search", "source_verifier",
"article_editor", "financial_analyzer"],
}
def build_journalist_agent(bundle_name: str, journalist_id: str):
bundle = BUNDLES[bundle_name]
# Collect simple skills as tools
tools = [SIMPLE_SKILLS[s] for s in bundle if s in SIMPLE_SKILLS]
# Wrap complex skills as tools (subagent pattern)
for skill_name in bundle:
if skill_name in COMPLEX_SKILLS:
subgraph = COMPLEX_SKILLS[skill_name]
@tool(skill_name, description=f"Run the {skill_name} workflow")
def run_skill(query: str, _sg=subgraph):
result = _sg.invoke({"messages": [{"role": "user", "content": query}]})
return result["messages"][-1].content
tools.append(run_skill)
return create_agent(
model="anthropic:claude-sonnet-4-6",
tools=tools,
system_prompt=f"You are a journalist assistant with these capabilities: {', '.join(bundle)}",
checkpointer=PostgresSaver(...)
)
MCP-Based Alternative
Package each skill as an MCP server for maximum decoupling:
# Each skill team maintains their own MCP server
client = MultiServerMCPClient({
"archive": {"transport": "stdio", "command": "python", "args": ["skills/archive/server.py"]},
"editor": {"transport": "http", "url": "http://editor-skill:8000/mcp"},
"financial": {"transport": "http", "url": "http://financial-skill:8000/mcp"},
})
# Bundle = which MCP servers to connect
async def build_agent_from_bundle(bundle_config: dict):
filtered_client = MultiServerMCPClient({
name: config for name, config in ALL_SERVERS.items()
if name in bundle_config["skills"]
})
tools = await filtered_client.get_tools()
return create_agent("anthropic:claude-sonnet-4-6", tools)
6. Decision Framework: When to Use What
Start Here: The Complexity Ladder
Level 0: Single LLM call (no tools)
└── Sufficient for: summarization, classification, formatting
Level 1: ReAct agent with tools
└── Sufficient for: search + answer, CRUD operations, simple workflows
└── JB today: article editing, archive search, profile lookups
Level 2: ReAct + middleware (handoffs/skills)
└── Need when: multi-stage workflows, >12 tools, domain switching
└── JB next: article pipeline (triage → research → edit → publish)
Level 3: Subagents (agent-as-tool)
└── Need when: context bloat, parallel domains, team-developed features
└── JB future: supervisor → research-agent + editor-agent + fact-checker
Level 4: Full orchestration (custom workflow + MCP)
└── Need when: complex multi-domain, many integrations, enterprise scale
└── JB vision: investor chat connecting DBs, APIs, web search, CMS
Quick Decision Matrix
| Question | Yes → | No → |
|---|---|---|
| <12 tools total? | ReAct is fine | Consider skills or subagents |
| All tools in same domain? | ReAct is fine | Subagents per domain |
| Sequential workflow? | Handoffs | Subagents (parallel) |
| Tools need independent deployment? | MCP servers | Direct tools |
| Context window filling up? | Subagents (isolation) | ReAct is fine |
| Multiple teams building features? | Subagents or Skills | Single agent |
| Need dynamic tool composition? | MCP or runtime graph build | Static graph |
| User-facing conversation? | Handoffs | Subagents (supervisor) |
7. Investor Chat: Why Deep Agents From Day One
An “investor chat” connecting multiple DBs, web search, and third-party APIs is inherently a multi-domain problem. Starting with a flat ReAct agent and bolting on complexity later creates technical debt:
Why subagents from the start:
-
Context isolation — Financial DB queries return large result sets. Web search returns pages of text. A flat ReAct agent accumulates all of this in one context window. With subagents, each specialist processes its domain and returns a summary. Token savings: ~67%.
-
Domain-specific prompting — A financial data agent needs different instructions than a news search agent. Subagents let each have optimal system prompts without a bloated unified prompt.
-
Parallel execution — “Compare company X financials with recent news coverage” naturally decomposes into parallel sub-tasks. Subagents can run concurrently.
-
Independent scaling — Financial data tools might need rate limiting. Web search might need caching. Subagents (or MCP servers) can scale independently.
-
Incremental development — Start with 2-3 subagents (financial, news, general). Add more without touching existing ones. Each subagent is a self-contained unit.
Recommended architecture for investor chat:
Supervisor Agent (orchestrator)
├── financial-agent (DB queries, calculations, charts)
├── news-agent (web search, RSS, archive search)
├── company-agent (profile lookups, regulatory filings)
└── general-agent (conversation, clarifications, formatting)
Each can be implemented as a subgraph first, then extracted to MCP servers if independent deployment is needed.
Related
- LangGraph Agent Pattern — existing pattern across kulify projects
- LangGraph — the framework itself
- RAG Pipeline — retrieval patterns (used within research subagents)
- Model Context Protocol — tool provider standard
- Streaming Chat Architecture — real-time delivery for agent responses
- fajb-next — production platform where these patterns apply
- ReAct Pattern — detailed breakdown of the core loop
- LangGraph Multi-Agent Patterns — implementation details for each pattern
- LangGraph Skills Pattern — pluggable skills deep dive
- reactive-agents, agent-swarms — a different paradigm: controller-free, event-reactive participants (mozaik, TS) instead of a graph/supervisor