Blog

Why We Wrote a Paper Against Browser Automation

Name: Unbrowse
Author: Unbrowse

Every browser automation agent does the same thing: spin up a headless browser, click through page elements, wait for renders, retry on failure, and burn tokens narrating what it sees. We spent the last year asking: what if most of that work is unnecessary?

Lewis Tham

April 5, 2026

Why We Wrote a Paper Against Browser Automation

Published 2026-04-05

The answer became our paper, "Internal APIs Are All You Need: Shadow APIs, Shared Discovery, and the Case Against Browser-First Agent Architectures" (arxiv.org/abs/2604.00694).

The problem: rediscovering the wheel on every run

Modern browser-first agents treat every website as a visual puzzle. They render the page, parse the DOM, decide where to click, wait for the result, and repeat. Each run starts from scratch, with no memory of prior visits.

This is wildly inefficient. The agent is spending tokens and wall-clock time doing what a human intern would do on their first day - figuring out how the site works. Except the agent does this every single time.

Worse, most of that "figuring out" leads to the same destination: an HTTP call to a backend API. The button click triggers a fetch. The form submission hits a REST endpoint. The page load pulls data from a JSON API. The browser is just a middleman.

The insight: shadow APIs are already there

Almost every modern website is a thin client over an internal API layer. React, Next.js, Vue, Svelte - they all fetch data from backend endpoints that were never intended to be public, but are fully callable if you know the route, headers, and payload shape.

We call these shadow APIs: internal endpoints that are not documented, not versioned for external consumers, but are perfectly stable and machine-callable. They are the real interface to the service. The browser UI is just one consumer of them.

In our study across 94 domains - spanning e-commerce, SaaS, media, finance, and government sites - we found exploitable internal API surfaces on every single one. Not most. All of them.

The evidence: 6.7x faster, 60% fewer tokens

We benchmarked direct API route execution against Playwright-based browser automation across those 94 domains on identical tasks: search, retrieve, submit, extract.

The results were not close:

6.7x faster median task completion vs. Playwright agents
60% fewer tokens consumed per task
100% task-level win rate - the API route matched or beat browser automation on every domain tested

The speed gap comes from eliminating rendering, DOM traversal, and visual reasoning entirely. The token savings come from replacing long chains of "I see a button labeled X, I will click it, now I see..." with a single structured API call.

The solution: shared route memory

Speed and token savings are good, but the real contribution is architectural. If an agent discovers that a site's search endpoint accepts a JSON body with a query field and returns structured results, that knowledge should persist. Not just for the next run, but for every agent that visits that domain.

We call this shared route memory: a collectively-built, continuously-verified map of API surfaces across the web. Once one agent discovers a route, no agent ever needs to rediscover it. The marginal cost of visiting a known domain drops to near zero.

This is the opposite of how browser automation works. Browser agents are stateless by design - every session is a blank slate. Shared route memory makes agents stateful across runs, across users, and across time.

What this means

Browser automation is not going away tomorrow. Some sites genuinely require browser interaction - CAPTCHAs, complex multi-step auth flows, heavily obfuscated SPAs. But those are the exception, not the rule.

For the vast majority of web tasks that agents perform today - data retrieval, form submission, search, content extraction - there is a faster, cheaper, more reliable path that skips the browser entirely.

The paper lays out the evidence. Unbrowse is the implementation.