Blog
How to Give Your LangChain Agent Direct API Access
Tutorial showing how to use Unbrowse as a LangChain tool to replace browser-based web access with direct API calls. Includes code examples, setup, and performance comparison.
How to Give Your LangChain Agent Direct API Access
Most LangChain agents access the web through browser tools -- Playwright, Selenium, or scraping libraries. Every web request means launching a browser, rendering a page, parsing HTML, and extracting data. This costs tokens, time, and compute.
There is a faster path. Unbrowse discovers the internal APIs behind websites and gives your LangChain agent direct access to structured data without touching a browser. This tutorial shows how to set it up.
The Problem with Browser-Based Web Tools
A typical LangChain agent using browser tools follows this flow for every web request:
- Launch a headless browser (~2-5 seconds)
- Navigate to the URL (~1-3 seconds)
- Wait for JavaScript to render (~1-5 seconds)
- Extract the full DOM or take a screenshot
- Send the DOM/screenshot to the LLM (~20K-100K tokens)
- LLM reasons about the page and decides next action
- Repeat for every click, scroll, or navigation
A 10-step web workflow can consume over 500,000 tokens and take 30-60 seconds. For agents that access the same sites repeatedly, this cost is entirely redundant.
The API-Native Alternative
Unbrowse replaces this loop with direct API calls. It captures the network traffic behind any website, reverse-engineers the internal endpoints, and calls them directly on subsequent visits.
First visit: Unbrowse opens a browser, captures traffic, indexes the APIs (20-80 seconds) Every subsequent visit: Direct API call, structured data returned in under 200 milliseconds
The published benchmark (Internal APIs Are All You Need) shows 3.6x mean speedup over Playwright across 94 live domains.
Setup
Install Unbrowse
git clone --single-branch --depth 1 https://github.com/unbrowse-ai/unbrowse.git ~/unbrowse
cd ~/unbrowse && ./setup --host off
Verify it is running:
unbrowse health
Install the Python SDK
pip install unbrowse langchain langchain-openai
Creating the Unbrowse Tool for LangChain
Unbrowse exposes a local REST API on http://localhost:6969. We wrap it as a LangChain tool:
import json
import requests
from langchain.tools import Tool
UNBROWSE_URL = "http://localhost:6969"
def unbrowse_resolve(query: str) -> str:
"""Resolve a web intent to structured data via Unbrowse.
Input should be a JSON string with 'intent' and 'url' fields.
Example: {"intent": "get trending repositories", "url": "https://github.com/trending"}
"""
try:
params = json.loads(query)
except json.JSONDecodeError:
# If plain text, treat as intent with no specific URL
params = {"intent": query}
response = requests.post(
f"{UNBROWSE_URL}/v1/intent/resolve",
json=params,
timeout=90
)
result = response.json()
# If we got direct data, return it
if "result" in result and result["result"]:
return json.dumps(result["result"], indent=2)[:4000]
# If we got available endpoints, return the summary
if "available_endpoints" in result:
endpoints = result["available_endpoints"][:5]
summary = []
for ep in endpoints:
summary.append({
"endpoint_id": ep.get("endpoint_id"),
"skill_id": ep.get("skill_id"),
"description": ep.get("description"),
"action_kind": ep.get("action_kind"),
"url": ep.get("url", "")[:100]
})
return json.dumps(summary, indent=2)
return json.dumps(result, indent=2)[:4000]
def unbrowse_execute(query: str) -> str:
"""Execute a specific Unbrowse endpoint.
Input should be a JSON string with 'skill_id' and 'endpoint_id' fields.
Optionally include 'path', 'extract', and 'limit'.
"""
params = json.loads(query)
skill_id = params.pop("skill_id")
response = requests.post(
f"{UNBROWSE_URL}/v1/skills/{skill_id}/execute",
json=params,
timeout=30
)
result = response.json()
return json.dumps(result, indent=2)[:4000]
def unbrowse_search(query: str) -> str:
"""Search the Unbrowse marketplace for available API routes.
Input should be a search intent string.
"""
response = requests.post(
f"{UNBROWSE_URL}/v1/search",
json={"intent": query},
timeout=15
)
result = response.json()
return json.dumps(result, indent=2)[:4000]
# Create LangChain tools
resolve_tool = Tool(
name="unbrowse_resolve",
func=unbrowse_resolve,
description="""Resolve a web intent to structured data. Use this instead of
browsing websites. Input: JSON with 'intent' (what you want) and 'url'
(the website). Returns structured API data or a list of available endpoints.
Example: {"intent": "get trending repos", "url": "https://github.com/trending"}"""
)
execute_tool = Tool(
name="unbrowse_execute",
func=unbrowse_execute,
description="""Execute a specific API endpoint discovered by unbrowse_resolve.
Input: JSON with 'skill_id', 'endpoint_id', and optionally 'path' (drill into
nested data), 'extract' (pick specific fields), 'limit' (cap results).
Example: {"skill_id": "abc", "endpoint_id": "def", "limit": 10}"""
)
search_tool = Tool(
name="unbrowse_search",
func=unbrowse_search,
description="""Search the Unbrowse marketplace for API routes other agents
have discovered. Use when you need data from a website but do not have a
specific URL. Input: a search query string."""
)
Building the Agent
Now create a LangChain agent that uses Unbrowse as its primary web access tool:
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_functions_agent
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Define the tools
tools = [resolve_tool, execute_tool, search_tool]
# Create a prompt that guides the agent to use Unbrowse
prompt = ChatPromptTemplate.from_messages([
("system", """You are a web research agent. Use unbrowse_resolve as your
primary tool for accessing websites. It returns structured API data
instead of raw HTML, making your responses faster and more accurate.
Workflow:
1. Use unbrowse_resolve with intent and URL to get data or discover endpoints
2. If you get available_endpoints, pick the best match and use unbrowse_execute
3. Use unbrowse_search to find routes when you don't have a specific URL
Never try to scrape or parse HTML. Unbrowse gives you structured data directly."""),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad")
])
# Create and run the agent
agent = create_openai_functions_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
max_iterations=5
)
Example: Research Agent
Here is a complete example that builds a research agent capable of pulling data from multiple websites:
# Run the agent
result = agent_executor.invoke({
"input": "Find the top trending GitHub repositories today and "
"compare them with the top stories on Hacker News."
})
print(result["output"])
What happens under the hood:
- The agent calls
unbrowse_resolvefor GitHub trending -- if cached, returns in <200ms - The agent calls
unbrowse_resolvefor Hacker News -- if cached, returns in <200ms - The agent combines the structured data and generates the comparison
Total time with browser tools: 30-60 seconds, 200K+ tokens Total time with Unbrowse (cached): 2-5 seconds, ~15K tokens
Example: Price Monitoring Agent
def create_price_monitor():
"""Agent that monitors prices across multiple sites."""
result = agent_executor.invoke({
"input": """Check the current price of the MacBook Pro M4 on:
1. Apple.com
2. Amazon.com
3. Best Buy
Report the prices in a comparison table."""
})
return result["output"]
The first run indexes each site. Subsequent runs resolve from cache, making scheduled price checks nearly instant.
Example: Multi-Source Data Aggregation
result = agent_executor.invoke({
"input": """Compile a daily briefing:
1. Top 5 Hacker News stories
2. Trending GitHub repos in Python
3. Latest posts from r/MachineLearning
Format as a concise daily digest."""
})
This pattern is where Unbrowse delivers the most value. A daily briefing agent runs the same queries every day. After the first run indexes each site, every subsequent run completes in seconds instead of minutes.
Handling Authentication
For sites that require login, Unbrowse extracts cookies from your Chrome browser automatically. If you are logged into a site in Chrome, it just works.
For explicit login when cookies are not available:
def unbrowse_login(url: str) -> str:
"""Trigger interactive login for an authenticated site."""
response = requests.post(
f"{UNBROWSE_URL}/v1/auth/login",
json={"url": url},
timeout=120
)
return response.json()
# Add as a tool
login_tool = Tool(
name="unbrowse_login",
func=unbrowse_login,
description="""Trigger interactive browser login for a site that
requires authentication. Opens a browser window for the user to
log in. Use when unbrowse_resolve returns auth_required."""
)
Performance Comparison
Here is a real comparison for a simple task: "Get the top 10 trending GitHub repositories."
With LangChain + Playwright
Action: playwright_navigate -> github.com/trending (3.2s, 45K tokens)
Action: playwright_get_content (1.1s, 82K tokens)
Action: parse HTML and extract repos (0.5s, 12K tokens)
Total: 4.8 seconds, 139K tokens
With LangChain + Unbrowse (first visit)
Action: unbrowse_resolve -> github.com/trending (24s, 8K tokens)
Total: 24 seconds, 8K tokens (one-time indexing cost)
With LangChain + Unbrowse (cached)
Action: unbrowse_resolve -> github.com/trending (0.18s, 4K tokens)
Total: 0.18 seconds, 4K tokens
The first visit is slower because Unbrowse is indexing. But every subsequent request is 25x faster and uses 35x fewer tokens.
Tips for Production Use
Pre-warm your cache. Before deploying, run resolves against all the sites your agent will access. This moves the indexing cost to build time rather than runtime.
sites = [
{"intent": "get trending repos", "url": "https://github.com/trending"},
{"intent": "get top stories", "url": "https://news.ycombinator.com"},
{"intent": "get front page", "url": "https://reddit.com"},
]
for site in sites:
unbrowse_resolve(json.dumps(site))
Use the marketplace. Popular sites are likely already indexed by other agents. Your first resolve may be instant if someone else has already contributed the route.
Submit feedback. After successful resolves, submit feedback to improve route quality:
requests.post(f"{UNBROWSE_URL}/v1/feedback", json={
"skill_id": skill_id,
"endpoint_id": endpoint_id,
"rating": 5,
"outcome": "success"
})
Handle errors gracefully. If Unbrowse returns no cached route and capture fails, fall back to your existing browser tool rather than failing the task entirely.
The TypeScript SDK Alternative
If you are building in TypeScript, Unbrowse provides a native SDK:
import { Unbrowse } from "@unbrowse/sdk";
const unbrowse = new Unbrowse();
const result = await unbrowse.resolve({
intent: "get trending repositories",
url: "https://github.com/trending",
});
The SDK also exposes a Playwright-compatible Browser API:
import { Browser } from "unbrowse";
const browser = await Browser.launch();
const page = await browser.newPage();
const response = await page.goto("https://github.com/trending");
const data = await response.json();
await browser.close();
page.goto() checks the skill cache first. If a cached route exists, no browser tab opens and structured data is returned directly.
Conclusion
Giving your LangChain agent Unbrowse instead of browser tools changes the economics of web access. The first visit indexes. Every subsequent visit is a fast API call. For agents that run daily, weekly, or on demand against the same set of websites, this means 90% lower token costs and 30x faster resolution.
The setup takes five minutes. The payoff compounds with every cached route.
Get started: github.com/unbrowse-ai/unbrowse
Read the paper: Internal APIs Are All You Need