Blog

Unbrowse vs Firecrawl: Scraping vs API Discovery

Firecrawl scrapes HTML into markdown. Unbrowse discovers the APIs behind pages. Compare web scraping against API discovery for AI agent data retrieval.

Lewis Tham
April 3, 2026

Unbrowse vs Firecrawl: Scraping vs API Discovery

Firecrawl turns web pages into clean markdown for LLMs. Unbrowse discovers the APIs behind those pages and calls them directly. Both tools help AI agents get data from the web, but they work at different levels of the stack. Firecrawl scrapes the rendered output. Unbrowse intercepts the data source. The difference is architectural, and it shows up in speed, cost, and data quality.

TL;DR

Firecrawl Unbrowse
Approach Scrapes HTML, converts to markdown/structured data Discovers internal APIs, calls endpoints directly
Speed Seconds per page (render + scrape + convert) Sub-100ms cached, 950ms average
Token cost Reduced vs raw HTML, still ~2,000-5,000 tokens ~200 tokens (structured JSON)
Auth handling Limited (API key-based access) Auto cookie extraction from real browsers, 15+ SSO providers
Data format Markdown or LLM-extracted JSON Native API JSON responses
Pricing Free tier (500 pages) + paid plans (credit-based) Free (open source), earns x402 micropayments
Best for Content extraction, site crawling, markdown conversion Structured data retrieval, API-heavy sites, agent pipelines

What is Firecrawl?

Firecrawl is a web scraping platform designed specifically for AI applications. It takes URLs and returns clean, LLM-ready content by handling the messy parts of web scraping: JavaScript rendering, proxy rotation, rate limiting, anti-bot detection, and content extraction. The output is markdown or structured data, stripped of navigation, ads, and boilerplate.

The platform offers five core capabilities. Scrape converts individual pages to markdown or JSON. Crawl discovers and processes all accessible pages on a site. Search combines web search with full-page content retrieval. Map identifies all URLs on a site. Interact, a newer feature, enables AI agents to click, type, and scroll before scraping.

Firecrawl has found strong adoption in the AI ecosystem. It integrates with Claude, Cursor, Windsurf, and other AI tools through MCP compatibility. It serves use cases from research aggregation to lead enrichment to training data preparation. The pricing is credit-based, with a free tier of 500 pages and paid plans scaling up from there.

For converting web content into LLM-consumable text, Firecrawl does a genuinely good job. The markdown output is cleaner than raw HTML, and the structured extraction handles common patterns well. If you need to feed web pages into an LLM, Firecrawl is a solid tool.

What is Unbrowse?

Unbrowse approaches the same problem from the other direction. Instead of scraping what the browser renders, it captures what the browser requests.

When you load a modern web page, your browser makes dozens of API calls: fetching product data, user profiles, search results, recommendations. The HTML is assembled client-side from these API responses. Scraping tools like Firecrawl process the assembled HTML. Unbrowse intercepts the raw API responses before they are turned into HTML.

The process is automatic. Unbrowse passively monitors network traffic during browsing sessions, identifies API endpoints, reverse-engineers their schemas (URL templates, auth headers, request/response structures), and caches everything. On subsequent requests, it calls these APIs directly. No page rendering, no scraping, no markdown conversion.

A shared marketplace lets agents benefit from routes discovered by the entire Unbrowse network. The peer-reviewed benchmark across 94 live domains (arxiv.org/abs/2604.00694) reports a 3.6x mean speedup and 5.4x median speedup over browser-based methods, with cached routes completing in under 100ms.

Key Differences

Architecture

Firecrawl operates on the rendered output of web pages. It loads a page in a browser, waits for JavaScript to execute, then extracts and converts the resulting HTML. Even though it strips boilerplate and cleans the output, the fundamental pipeline is: render page, then scrape content. Every request goes through the full browser rendering pipeline.

Unbrowse operates on the data layer beneath the page. It recognizes that the rendered HTML is just a presentation layer over structured API data. Instead of rendering the page and extracting content from the result, Unbrowse calls the source APIs and gets the data in its original structured form.

Consider a product listing page. Firecrawl would render the page, strip the navigation and footer, and return markdown like "Product Name - $29.99 - 4.5 stars." Unbrowse would call the product API endpoint and return {"name": "Product Name", "price": 29.99, "rating": 4.5, "reviews_count": 1247}. The second response is more structured, more complete, and required no rendering or parsing.

Performance

Firecrawl must render each page before scraping it. Even with optimizations, this takes seconds per page: browser launch, page load, JavaScript execution, content extraction, markdown conversion. For bulk crawling, this adds up. Crawling 1,000 pages at 3-5 seconds each takes nearly an hour.

Unbrowse cached routes return in under 100ms. A thousand cached requests complete in under two minutes. Even uncached routes average 950ms because the browser session focuses on capturing API calls rather than fully rendering content. The 3.6x mean speedup benchmark understates the advantage for cached routes, where the speedup can exceed 30x.

Data Quality

This is a nuanced difference that matters deeply for AI applications.

Firecrawl returns cleaned markdown or LLM-extracted structured data. The markdown is human-readable and LLM-friendly, but it is a lossy transformation. Metadata that exists in the API response but is not displayed on the page is lost. Pagination details, exact timestamps, internal IDs, related entity references, and other structured fields may not survive the render-then-scrape pipeline.

Unbrowse returns the original API response. Nothing is lost in translation because there is no translation. The JSON includes every field the API provides, including metadata that never appears in the rendered HTML. For agents that need precise, complete data, this is a significant advantage.

Token Efficiency

Firecrawl improves on raw HTML scraping by producing clean markdown, reducing a typical page from 8,000-50,000 tokens of raw HTML to roughly 2,000-5,000 tokens of markdown. That is a meaningful improvement, but the output still includes text formatting, headings, and prose that the agent must parse to find specific data points.

Unbrowse returns structured JSON: typically 200-500 tokens of exactly the data the agent needs. No parsing required. No extraction step. The agent can immediately use the response in its reasoning without spending tokens on content interpretation.

The difference between 3,000 tokens per page (Firecrawl) and 300 tokens per page (Unbrowse) becomes enormous at scale. An agent making 1,000 requests per day saves 2.7 million tokens daily by using API responses instead of scraped markdown.

Authentication

Firecrawl operates primarily on public web pages. It can handle some authenticated content, but its scraping model is not built around maintaining user sessions across sites. For gated content, you typically need to provide API keys or tokens separately.

Unbrowse automatically extracts authentication cookies from your real browser sessions. If you are logged into a site in Chrome or Firefox, Unbrowse can call authenticated API endpoints using those credentials. It supports 15+ SSO providers and maintains per-domain auth profiles. This means agents can access the same data you see when logged in, including personalized content, private dashboards, and account-specific information.

When to Use Firecrawl

Firecrawl is the right choice when:

  • You need full-page content as text: blog posts, articles, documentation pages where the content IS the page
  • Bulk site crawling: mapping and processing every page on a site
  • Content ingestion for RAG: building knowledge bases from web content
  • Simple page-to-markdown conversion: when you need readable text, not structured data
  • Public content at moderate scale: the credit-based model works well for bounded tasks

When to Use Unbrowse

Unbrowse is the clear choice when:

  • You need structured data: product info, pricing, user profiles, search results, anything backed by an API
  • Speed is critical: sub-100ms vs seconds per page
  • Token efficiency matters: 10-20x fewer tokens than even cleaned markdown
  • Authenticated content: accessing gated data using real browser sessions
  • High-volume retrieval: thousands of requests per hour without per-page costs
  • Agent pipelines: shared route marketplace means zero-cost repeat queries
  • Data completeness: API responses include metadata not visible on rendered pages

Getting Started with Unbrowse

npm install -g unbrowse
unbrowse setup

Get structured data from any site:

unbrowse resolve "get product details" --url https://example.com/product/123

The first request discovers and caches the API route. Every subsequent request to that pattern returns structured JSON in under 100ms.

The Bottom Line

Firecrawl and Unbrowse solve adjacent problems with different architectures. Firecrawl excels at converting web content into text: articles, documentation, blog posts where the rendered page IS the data. Unbrowse excels at extracting structured data that lives behind web pages: product information, search results, user data, and anything served by an API endpoint.

The fundamental question is whether your agent needs text or data. If it needs to read an article, Firecrawl delivers clean markdown. If it needs to pull product prices, search results, or account information, Unbrowse delivers the exact JSON the site's own frontend uses, at 3.6x the speed and a fraction of the token cost.

As more of the web moves to API-driven architectures, the data behind the page becomes more valuable than the page itself. Unbrowse is built for that reality.