Benchmark

94 Domains, 100% Win Rate: The Full Benchmark

We tested Unbrowse against Playwright on every major website category. Browser automation lost every time.

2026-04-02 · Lewis Tham · arXiv:2604.00694

The numbers at a glance

94
Domains tested
live production websites
3.6x
Mean speedup
cached vs. Playwright
5.4x
Median speedup
half of domains faster than this
30x
Best single domain
peak speedup observed
100%
Win rate
Unbrowse faster on every domain
90-96%
Cost reduction
per repeated task

Why we ran this benchmark

Claims are easy. Data is not. When we published Internal APIs Are All You Need, we made a specific claim: that routing agent tasks through cached internal APIs is categorically faster, cheaper, and more reliable than browser automation. This page is the full data behind that claim.

We benchmarked 94 live production websites across every major category — e-commerce, news, social, finance, SaaS, travel, government, search engines, forums. Every domain was tested with real Playwright browser automation and with Unbrowse's cached API execution, on the same tasks, same inputs, same expected outputs.

The result: Unbrowse was faster on every single domain. Not most. Every one.

Methodology

Each domain was tested with a representative task: search, retrieve a product page, fetch a user profile, read an article. The browser path used Playwright with a standard Chromium instance — no special stealth plugins, no pre-warmed profiles. The cached path used Unbrowse with a warmed local skill cache (the realistic steady-state for any agent that has visited the domain before).

Cold-start costs (first-time discovery) are reported separately. We did not cherry-pick domains. The full list includes sites with aggressive bot detection, heavy server-rendering, and unusual architectures.

Latency was measured end-to-end: from task invocation to structured data returned. Cost was computed from LLM token usage (GPT-4-class pricing) plus compute time.

Response time distribution

18 of 94 domains responded in under 100ms with cached execution. The majority landed under 200ms — fast enough that the API call is invisible in any agent workflow.

Response timeDomainsShareTypical domains
Sub-100ms1819%Simple REST APIs, static JSON endpoints, search APIs with clean responses
100-200ms3436%Most e-commerce product pages, news sites, social media feeds
200-500ms2830%Dashboard-style SPAs, sites with complex auth flows
500ms-1s1011%Heavy server-rendered pages, GraphQL aggregators
Over 1s44%WAF-gated sites requiring full cookie dance

The 4 domains over 1 second were all behind aggressive WAFs (Cloudflare Turnstile, Akamai Bot Manager) that required a full browser cookie bootstrap before the cached path could work. Even these were still 1.5–2.1x faster than Playwright.

Bot detection impact

The biggest variable in speedup was bot detection. On the 61 domains with no protection, Unbrowse averaged a 6.8x speedup. On WAF-protected sites, the speedup dropped to 2.1x — still a clear win, but the cookie bootstrap phase adds latency.

Protection levelDomainsShareAvg. speedupNotes
No bot detection6165%6.8xDirect API calls succeed immediately
Basic bot detection1819%3.2xUser-Agent + cookie checks, bypassed with stored credentials
WAF-protected (Cloudflare, Akamai, etc.)1516%2.1xRequires browser cookie bootstrap, then cached

Cost comparison

The cost difference is even more dramatic than the speed difference. Browser automation burns tokens on DOM parsing, screenshot analysis, and multi-step navigation. Cached API calls skip all of it.

MetricBrowser (Playwright)Cached (Unbrowse)Ratio
Avg. cost per task$0.53$0.005106x
Avg. tokens per task~8,000~20040x
Avg. latency per task3,404ms950ms3.6x
Median latency2,800ms520ms5.4x

At $0.005 per cached task vs. $0.53 per browser task, the 106x cost ratio means an agent running 1,000 tasks per day saves roughly $525/day by using cached routes. Over a month, that is $15,750 in compute savings for a single agent.

Cold start: the honest cost

Unbrowse is not free the first time. When an agent encounters a domain it has never seen before, someone pays the cold-start cost: a real browser session to capture traffic, followed by automated endpoint extraction and LLM-based schema inference.

PhaseTimeNotes
First browse + capture8-15sOne-time cost per domain
Endpoint extraction + schema inference2-4sAutomatic, runs on close
LLM augmentation (descriptions, params)1-3sSemantic metadata generation
Total cold start (amortized)12.4sTypically paid back in 3-5 reuses

The 12.4-second average cold start is amortized across all future uses of that domain — not just by the discovering agent, but by every agent on the shared graph. With a 3.6x average speedup saving ~2.5 seconds per task, the cold start pays for itself in 3–5 reuses. After that, every call is pure savings.

Performance by domain category

Not all websites are created equal. Search engines and news sites, whose core product is already a structured API, showed the highest speedups. Government sites and server-rendered portals showed the lowest — but still consistently beat browser automation.

CategoryDomainsAvg. speedupNotes
E-commerce / Retail165.1xProduct search, pricing, inventory APIs are clean REST
News / Media146.2xContent APIs are fast, minimal auth
Social platforms82.8xAuth-heavy, some GraphQL complexity
Developer tools / SaaS124.5xOften have well-structured internal APIs
Finance / Fintech93.1xWAF-heavy, but APIs are clean once auth is solved
Travel / Hospitality85.8xSearch + booking APIs are highly structured
Government / Education72.4xOlder stacks, more server-rendered HTML
Search engines / Directories67.1xSearch APIs are the core product
Other (forums, wikis, misc.)143.9xMixed results depending on stack

WebArena accuracy: not just faster, more correct

Speed and cost are useful, but they mean nothing if the agent gets worse answers. We ran the standard WebArena benchmark to test whether hybrid agents — using cached APIs when available, falling back to the browser when not — are also more accurate.

AgentTask accuracyNotes
Browser-only agent14.0%Baseline — pure Playwright automation
Unbrowse hybrid agent17.4%+24% accuracy improvement

The 24% improvement comes from eliminating rendering-related failure modes: timeout waiting for JavaScript, misidentified DOM elements, stale page state after navigation. When the agent gets structured data directly from an API, there is less to go wrong.

Where Unbrowse struggles

This is a benchmark post, not a press release. Here is where the approach falls short.

GraphQL POST endpoints

Sites that use GraphQL with POST requests and massive JSON bodies (X/Twitter's HomeTimeline is the canonical example) are hard to capture passively. The operation name is buried inside the request body, and the response schema varies by query. We are working on operationName extraction, but this is not solved today.

Heavy server-rendering

Government portals, older CMS-based sites, and some enterprise SaaS products render HTML on the server with no client-side API calls. There is nothing to intercept. On these domains, Unbrowse still works via HTML parsing, but the speedup is lower (2–3x) because the “API” is effectively the rendered page itself.

Aggressive WAFs with rotating challenges

Cloudflare Turnstile, Akamai Bot Manager, and PerimeterX can require fresh browser challenge solves on every session. Unbrowse caches the resulting cookies, but if the WAF rotates challenges faster than the cache TTL, the agent falls back to a browser session more often. The 2.1x speedup on WAF-protected sites reflects this reality.

Cold start on long-tail domains

The 12.4-second cold start is only amortized if the domain gets reused. For truly one-off visits to obscure websites, the cold-start cost is strictly additive. The shared graph mitigates this for popular domains (someone else likely already discovered the routes), but the long tail will always have cold starts.

What the data means

Browser automation is a general-purpose fallback. It works everywhere, slowly. Cached API execution is a specialized fast path. It works on any domain that has been visited before, and the set of visited domains grows with every agent on the network.

The practical takeaway: for any agent that repeatedly visits the same set of websites — which describes most production agents — switching to cached API execution delivers an immediate and compounding improvement in speed, cost, and reliability.

The 100% win rate across 94 domains is not a cherry-picked result. It reflects a fundamental architectural advantage: skipping the rendering pipeline is always faster than going through it. The only variable is how much faster.

Read the full paper

The full methodology, per-domain results, statistical analysis, and architectural details are in the paper. The benchmark code and raw data are open source.

Install: curl -fsSL https://unbrowse.ai/install.sh | bash