When a single agent isn't enough โ for parallelization, specialization, or quality checking โ you need multi-agent systems. This guide covers three leading frameworks: CrewAI, AutoGen, and LangGraph, with practical examples for each.
Why Multi-Agent?
Single agents struggle with tasks that are too complex for one context window, require parallel processing, or benefit from specialization and peer review. Multi-agent systems address these by:
- Parallelization: Run research, coding, and analysis simultaneously
- Specialization: Each agent has a specific role and focused system prompt
- Error checking: Critic agents review other agents' work
- Scale: Decompose massive tasks across many agents
Framework 1: CrewAI
CrewAI models AI collaboration as a "crew" of agents with defined roles, backstories, and goals โ similar to a business team.
pip install crewai crewai-tools
from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, WebsiteSearchTool
search_tool = SerperDevTool()
# Define agents with roles and backstories
researcher = Agent(
role="Senior Research Analyst",
goal="Uncover cutting-edge developments in {topic}",
backstory="""You are an expert research analyst with a talent for finding
authoritative, up-to-date information. You synthesize complex information
into clear, actionable insights.""",
tools=[search_tool],
verbose=True,
max_iter=3
)
writer = Agent(
role="Tech Content Strategist",
goal="Craft compelling content about {topic}",
backstory="""You are a skilled writer who translates technical research into
engaging narratives. You excel at making complex topics accessible.""",
verbose=True
)
editor = Agent(
role="Senior Editor",
goal="Ensure content is accurate, clear, and publication-ready",
backstory="""A meticulous editor who catches errors, improves clarity,
and ensures consistent tone and style.""",
verbose=True
)
# Define tasks with expected outputs
research_task = Task(
description="Research the latest developments in {topic}. Identify the top 5 trends.",
expected_output="A structured report with 5 trends, each with evidence and sources.",
agent=researcher
)
write_task = Task(
description="Write a comprehensive blog post based on the research report.",
expected_output="A 1000-word blog post with clear sections and engaging writing.",
agent=writer,
context=[research_task] # Depends on research
)
edit_task = Task(
description="Review and refine the blog post. Fix errors, improve clarity.",
expected_output="A polished, publication-ready blog post.",
agent=editor,
context=[write_task]
)
# Assemble and run the crew
crew = Crew(
agents=[researcher, writer, editor],
tasks=[research_task, write_task, edit_task],
process=Process.sequential, # or Process.hierarchical
verbose=True
)
result = crew.kickoff(inputs={"topic": "AI agents in 2026"})
print(result)
Framework 2: AutoGen
AutoGen (Microsoft) focuses on conversational multi-agent patterns where agents talk to each other to solve problems collaboratively.
pip install pyautogen
import autogen
config_list = [{"model": "claude-opus-4-6", "api_key": "your_key",
"api_type": "anthropic"}]
llm_config = {"config_list": config_list, "timeout": 60, "temperature": 0}
# User proxy โ represents the human, can execute code
user_proxy = autogen.UserProxyAgent(
name="User_Proxy",
human_input_mode="NEVER", # Fully autonomous; use "ALWAYS" for human loop
max_consecutive_auto_reply=10,
code_execution_config={"work_dir": "coding", "use_docker": False}
)
# Assistant agent โ the AI
assistant = autogen.AssistantAgent(
name="Assistant",
llm_config=llm_config,
system_message="You are a helpful AI assistant. Write Python code to solve problems."
)
# Critic agent โ reviews the assistant's work
critic = autogen.AssistantAgent(
name="Critic",
llm_config=llm_config,
system_message="""You are a code reviewer. Review the code for:
1. Correctness and edge cases
2. Security vulnerabilities
3. Performance issues
Provide specific, actionable feedback."""
)
# Group chat for multi-agent conversation
groupchat = autogen.GroupChat(
agents=[user_proxy, assistant, critic],
messages=[],
max_round=15,
speaker_selection_method="round_robin"
)
manager = autogen.GroupChatManager(
groupchat=groupchat,
llm_config=llm_config
)
# Start the conversation
user_proxy.initiate_chat(
manager,
message="Write a Python function that validates email addresses using regex, "
"then write comprehensive unit tests for it."
)
Framework 3: LangGraph
LangGraph models workflows as a stateful graph โ nodes are agents/functions, edges define flow, and state is shared across nodes. Best for complex workflows with branching logic and human-in-the-loop.
pip install langgraph langchain-anthropic
from typing import TypedDict, Annotated, Sequence
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
import operator
# Define the shared state
class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], operator.add]
current_agent: str
draft: str
feedback: str
approved: bool
llm = ChatAnthropic(model="claude-opus-4-6")
# Define agent node functions
def researcher_agent(state: AgentState) -> dict:
"""Research and create initial draft"""
response = llm.invoke([
HumanMessage(content=f"""You are a researcher. Create a detailed outline for:
{state['messages'][-1].content}
Return a structured outline.""")
])
return {
"messages": [AIMessage(content=response.content, name="researcher")],
"draft": response.content,
"current_agent": "reviewer"
}
def reviewer_agent(state: AgentState) -> dict:
"""Review the draft and provide feedback"""
response = llm.invoke([
HumanMessage(content=f"""Review this outline and rate it 1-10.
If score >= 8, say 'APPROVED'. Otherwise provide specific improvements.
Outline:
{state['draft']}""")
])
approved = "APPROVED" in response.content.upper()
return {
"messages": [AIMessage(content=response.content, name="reviewer")],
"feedback": response.content,
"approved": approved,
"current_agent": "writer" if approved else "researcher"
}
def writer_agent(state: AgentState) -> dict:
"""Write the final article"""
response = llm.invoke([
HumanMessage(content=f"""Write a complete article based on this approved outline:
{state['draft']}
Write 800-1000 words.""")
])
return {
"messages": [AIMessage(content=response.content, name="writer")],
"draft": response.content,
"current_agent": "done"
}
# Build the graph
def should_continue(state: AgentState) -> str:
"""Router: decide which node to go to next"""
if state.get("approved"):
return "writer"
return "researcher" # Loop back for revision
workflow = StateGraph(AgentState)
# Add nodes
workflow.add_node("researcher", researcher_agent)
workflow.add_node("reviewer", reviewer_agent)
workflow.add_node("writer", writer_agent)
# Add edges
workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "reviewer")
workflow.add_conditional_edges("reviewer", should_continue, {
"researcher": "researcher", # Revise
"writer": "writer" # Approved
})
workflow.add_edge("writer", END)
# Compile with memory (enables human-in-the-loop checkpointing)
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
# Run
config = {"configurable": {"thread_id": "article-1"}}
result = app.invoke(
{"messages": [HumanMessage(content="The future of AI agents in healthcare")]},
config=config
)
print(result["draft"])
Choosing the Right Framework
CrewAI
Best for: Role-based teams, straightforward sequential/hierarchical workflows, getting started quickly with multi-agent concepts.
AutoGen
Best for: Conversational agents, code generation + execution workflows, research tasks, Microsoft ecosystem.
LangGraph
Best for: Complex workflows with branching/cycles, human-in-the-loop, stateful applications, production systems.
Key Takeaways
- Multi-agent systems enable parallelization, specialization, and quality checking
- CrewAI: role-based crews with natural language task descriptions โ best for quick prototyping
- AutoGen: conversational agents that talk to each other โ best for code-heavy workflows
- LangGraph: stateful graph workflows with branching โ best for production-grade systems
- Add human-in-the-loop checkpoints for any high-stakes agent workflow
- Start single-agent, add multi-agent complexity only when genuinely needed