Blog
CrewAI + Unbrowse: Build Agents That Skip the Browser
Tutorial for integrating Unbrowse with CrewAI to give your agent crews direct API access to websites. Skip browser automation entirely with cached API routes.
CrewAI + Unbrowse: Build Agents That Skip the Browser
CrewAI is built for multi-agent collaboration. But when your crew needs web data, every agent launches a browser, renders pages, and burns tokens on DOM parsing. What if your agents could call the APIs behind those pages directly?
Unbrowse discovers the internal APIs (shadow APIs) behind any website and serves them as structured data. Integrate it with CrewAI and your agents get web access that is 30x faster on repeat visits, uses 90% fewer tokens, and benefits from routes discovered by any agent in the shared marketplace.
This tutorial shows how to build CrewAI agents that skip the browser entirely.
Why Browser Tools Are Expensive for Crews
CrewAI crews often have multiple agents that each need web access. A typical research crew might have:
- A Researcher agent that gathers data from 5-10 websites
- An Analyst agent that pulls comparison data from 3-4 sites
- A Writer agent that fact-checks against original sources
With browser tools, each agent independently launches browsers, renders pages, and extracts data. A crew with three agents accessing five sites each means 15 browser sessions, each consuming 20K-100K tokens. That is 300K-1.5M tokens per crew run for web access alone.
With Unbrowse, the first agent that accesses a site indexes it. Every subsequent agent -- in the same crew or any future crew -- resolves from cache in under 200 milliseconds.
Setup
Install Unbrowse
git clone --single-branch --depth 1 https://github.com/unbrowse-ai/unbrowse.git ~/unbrowse
cd ~/unbrowse && ./setup --host off
unbrowse health
Install CrewAI
pip install crewai unbrowse
Building the Unbrowse Tool for CrewAI
CrewAI uses custom tools that extend BaseTool. Here is the Unbrowse tool implementation:
import json
import requests
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from typing import Optional
UNBROWSE_URL = "http://localhost:6969"
class UnbrowseResolveInput(BaseModel):
intent: str = Field(description="What data you want from the website")
url: str = Field(description="The website URL to access")
params: Optional[dict] = Field(
default=None,
description="Additional parameters like search queries"
)
class UnbrowseResolveTool(BaseTool):
name: str = "unbrowse_resolve"
description: str = (
"Access any website and get structured API data back. "
"Much faster than browsing -- cached sites return in <200ms. "
"Provide an intent (what you want) and URL (which site)."
)
args_schema: type[BaseModel] = UnbrowseResolveInput
def _run(self, intent: str, url: str, params: dict = None) -> str:
payload = {"intent": intent, "params": {"url": url}}
if params:
payload["params"].update(params)
response = requests.post(
f"{UNBROWSE_URL}/v1/intent/resolve",
json=payload,
timeout=90
)
result = response.json()
if "result" in result and result["result"]:
return json.dumps(result["result"], indent=2)[:5000]
if "available_endpoints" in result:
endpoints = result["available_endpoints"][:5]
summary = []
for ep in endpoints:
summary.append({
"endpoint_id": ep.get("endpoint_id"),
"skill_id": ep.get("skill_id"),
"description": ep.get("description"),
"action_kind": ep.get("action_kind"),
"sample_values": ep.get("sample_values", {})[:3]
if isinstance(ep.get("sample_values"), list)
else ep.get("sample_values", {})
})
return json.dumps(summary, indent=2)
return json.dumps(result, indent=2)[:5000]
class UnbrowseExecuteInput(BaseModel):
skill_id: str = Field(description="The skill ID from resolve results")
endpoint_id: str = Field(description="The endpoint ID to execute")
limit: Optional[int] = Field(default=10, description="Max results to return")
path: Optional[str] = Field(
default=None,
description="Dot-path to drill into nested results"
)
extract: Optional[str] = Field(
default=None,
description="Comma-separated fields to extract"
)
class UnbrowseExecuteTool(BaseTool):
name: str = "unbrowse_execute"
description: str = (
"Execute a specific API endpoint discovered by unbrowse_resolve. "
"Use the skill_id and endpoint_id from resolve results."
)
args_schema: type[BaseModel] = UnbrowseExecuteInput
def _run(
self, skill_id: str, endpoint_id: str,
limit: int = 10, path: str = None, extract: str = None
) -> str:
payload = {"endpoint_id": endpoint_id, "limit": limit}
if path:
payload["path"] = path
if extract:
payload["extract"] = extract
response = requests.post(
f"{UNBROWSE_URL}/v1/skills/{skill_id}/execute",
json=payload,
timeout=30
)
return json.dumps(response.json(), indent=2)[:5000]
class UnbrowseSearchInput(BaseModel):
intent: str = Field(description="What you are looking for")
domain: Optional[str] = Field(
default=None,
description="Specific domain to search"
)
class UnbrowseSearchTool(BaseTool):
name: str = "unbrowse_search"
description: str = (
"Search the Unbrowse marketplace for API routes that other "
"agents have already discovered. Use before resolve when you "
"do not have a specific URL."
)
args_schema: type[BaseModel] = UnbrowseSearchInput
def _run(self, intent: str, domain: str = None) -> str:
payload = {"intent": intent}
if domain:
payload["domain"] = domain
response = requests.post(
f"{UNBROWSE_URL}/v1/search",
json=payload,
timeout=15
)
return json.dumps(response.json(), indent=2)[:5000]
Example: Research Crew
Here is a complete CrewAI crew that researches a topic across multiple sources:
from crewai import Agent, Task, Crew, Process
# Create tools
resolve_tool = UnbrowseResolveTool()
execute_tool = UnbrowseExecuteTool()
search_tool = UnbrowseSearchTool()
# Define agents
researcher = Agent(
role="Senior Web Researcher",
goal="Find comprehensive data on a topic from multiple web sources",
backstory="You are an expert at finding and extracting structured data "
"from websites. You use unbrowse_resolve to access sites directly "
"through their APIs rather than browsing.",
tools=[resolve_tool, execute_tool, search_tool],
verbose=True
)
analyst = Agent(
role="Data Analyst",
goal="Analyze and compare data from multiple sources",
backstory="You synthesize data from multiple sources into clear, "
"actionable insights. When you need additional web data, "
"use unbrowse_resolve for fast API access.",
tools=[resolve_tool, execute_tool],
verbose=True
)
writer = Agent(
role="Report Writer",
goal="Produce a well-structured research report",
backstory="You write clear, data-driven reports. You can use "
"unbrowse_resolve to fact-check claims against original sources.",
tools=[resolve_tool],
verbose=True
)
# Define tasks
research_task = Task(
description="""Research the current state of AI browser automation tools.
Gather data from:
1. GitHub trending repos related to browser automation
2. Hacker News discussions about AI agents
3. Reddit r/MachineLearning for recent posts about browser agents
For each source, use unbrowse_resolve with the site URL and a clear intent.
Return structured data from each source.""",
expected_output="Structured data from all three sources with key findings",
agent=researcher
)
analysis_task = Task(
description="""Analyze the research data and identify:
1. The most popular browser automation tools by GitHub stars
2. Common pain points discussed on HN and Reddit
3. Trends in the space (what is growing, what is declining)
If you need additional data points, use unbrowse_resolve.""",
expected_output="Analysis with trends, rankings, and key insights",
agent=analyst
)
report_task = Task(
description="""Write a concise research report (500 words) on the state
of AI browser automation in 2026. Include data points from the research
and analysis. Fact-check any specific claims using unbrowse_resolve.""",
expected_output="A 500-word research report with data-backed claims",
agent=writer
)
# Create and run the crew
crew = Crew(
agents=[researcher, analyst, writer],
tasks=[research_task, analysis_task, report_task],
process=Process.sequential,
verbose=True
)
result = crew.kickoff()
print(result)
Example: Competitive Intelligence Crew
scout = Agent(
role="Competitive Intelligence Scout",
goal="Monitor competitor websites for product changes",
backstory="You track competitor product pages, pricing, and features. "
"Use unbrowse_resolve for fast, structured access to each site.",
tools=[resolve_tool, execute_tool],
verbose=True
)
strategist = Agent(
role="Strategy Analyst",
goal="Identify competitive threats and opportunities",
backstory="You analyze competitive data to find strategic insights.",
tools=[resolve_tool],
verbose=True
)
scout_task = Task(
description="""Check the following competitor sites for current pricing
and feature lists:
1. competitor-a.com/pricing
2. competitor-b.com/features
3. competitor-c.com/products
Use unbrowse_resolve for each. Return structured pricing and feature data.""",
expected_output="Structured competitor data with pricing and features",
agent=scout
)
strategy_task = Task(
description="""Compare competitor data against our current offering.
Identify: pricing gaps, feature gaps, and positioning opportunities.""",
expected_output="Strategic recommendations with specific data points",
agent=strategist
)
competitive_crew = Crew(
agents=[scout, strategist],
tasks=[scout_task, strategy_task],
process=Process.sequential
)
Example: Daily Digest Crew
This crew runs daily and benefits most from caching:
from datetime import date
collector = Agent(
role="Content Collector",
goal="Gather today's top content from key sources",
backstory="You collect content from multiple sources daily. "
"All sites are pre-indexed, so unbrowse_resolve returns "
"cached data in milliseconds.",
tools=[resolve_tool, execute_tool],
verbose=True
)
editor = Agent(
role="Digest Editor",
goal="Curate and format the daily digest",
backstory="You select the most important items and write concise summaries.",
tools=[],
verbose=True
)
collect_task = Task(
description=f"""Collect today's ({date.today()}) top content:
1. Top 10 Hacker News stories (resolve news.ycombinator.com)
2. Trending GitHub repos (resolve github.com/trending)
3. Top posts from r/technology (resolve reddit.com/r/technology)
4. Latest arXiv ML papers (resolve arxiv.org)
All these sites should be cached -- expect sub-second resolution.""",
expected_output="Structured content from all four sources",
agent=collector
)
edit_task = Task(
description="""Create a daily digest email with:
- 5 top stories (cross-referenced across sources)
- 3 trending repos worth watching
- 1 notable ML paper
Format as a brief, scannable email.""",
expected_output="Formatted daily digest ready to send",
agent=editor
)
digest_crew = Crew(
agents=[collector, editor],
tasks=[collect_task, edit_task],
process=Process.sequential
)
# Run daily -- after first run, all resolves hit cache
result = digest_crew.kickoff()
On the first run, each site takes 20-80 seconds to index. On day 2 and beyond, the collector agent resolves all four sites in under a second total. The entire crew completes in seconds instead of minutes.
Performance: Crew Runs Over Time
| Run | Browser Tools | Unbrowse |
|---|---|---|
| First run (4 sites) | ~45 seconds | ~120 seconds (indexing) |
| Second run | ~45 seconds | ~3 seconds |
| 10th run | ~45 seconds | ~3 seconds |
| 30th run | ~45 seconds | ~3 seconds |
| Cumulative (30 runs) | ~22 minutes | ~2.5 minutes |
Browser tools have constant cost. Unbrowse has a one-time indexing cost followed by near-zero marginal cost.
For a daily crew that runs for a month, that is 22 minutes of browser time versus 2.5 minutes total. The token savings follow the same pattern -- constant high cost versus one-time cost plus near-zero repeat cost.
Multi-Agent Cache Sharing
One of the strongest advantages with CrewAI specifically: when one agent in the crew indexes a site, every other agent in the crew (and every future crew) benefits from the cache.
# Agent A resolves github.com/trending (indexes, 30 seconds)
# Agent B resolves github.com/trending (cache hit, 180ms)
# Agent C resolves github.com/trending (cache hit, 180ms)
# Tomorrow's crew resolves github.com/trending (cache hit, 180ms)
The shared marketplace extends this further: routes indexed by agents on other machines are discoverable too. Your crew benefits from the collective knowledge of every Unbrowse agent.
Best Practices for CrewAI + Unbrowse
Pre-warm caches before crew runs. If you know which sites your crew will access, resolve them beforehand:
warmup_sites = [
{"intent": "get trending repos", "url": "https://github.com/trending"},
{"intent": "get top stories", "url": "https://news.ycombinator.com"},
]
for site in warmup_sites:
requests.post(f"{UNBROWSE_URL}/v1/intent/resolve", json=site, timeout=90)
Give agents clear tool instructions. In the agent backstory, explicitly mention using unbrowse_resolve instead of browsing. CrewAI agents may default to browser tools if not guided.
Use execute for large datasets. When resolve returns available endpoints with many results, use unbrowse_execute with --limit and --extract to get only what you need.
Handle auth upfront. If your crew accesses authenticated sites, run unbrowse login --url <site> before the crew starts.
Conclusion
CrewAI crews multiply the cost of browser-based web access because multiple agents each independently browse. Unbrowse collapses that cost: one agent indexes, every agent benefits. For recurring crews, the savings compound daily.
The integration takes 15 minutes. The tools work with any CrewAI agent configuration. And the shared marketplace means your crew starts with routes already discovered by other agents worldwide.
Get started: github.com/unbrowse-ai/unbrowse
Read the paper: Internal APIs Are All You Need