Blog

Shadow API Discovery: A Step-by-Step Tutorial

Learn how to discover hidden shadow APIs behind any website. A practical step-by-step tutorial using Unbrowse to find, document, and call internal API endpoints.

Lewis Tham
April 3, 2026

Every website you use daily -- Amazon, Reddit, Twitter, YouTube, LinkedIn -- is a thin frontend layer over a set of internal APIs. When you search for products on Amazon, your browser is not magically generating results: it is calling GET /s?k=laptop&ref=nb_sb_noss and rendering the JSON response. When you scroll your Reddit feed, the client is hitting GET /api/v1/subreddit/hot.json.

These internal endpoints are called shadow APIs. They are not documented. They are not meant for external use. But they contain the exact same data the website shows you -- in clean, structured JSON format.

Shadow API discovery is the process of finding these endpoints, understanding their schemas, and building callable interfaces from them. This tutorial walks through the entire process using Unbrowse, step by step.

Prerequisites

  • Node.js 18+ installed
  • A terminal (macOS, Linux, or WSL on Windows)
  • 5 minutes of your time

Step 1: Install Unbrowse

npx unbrowse setup

This installs the Unbrowse CLI globally and sets up the Kuri browser runtime. Kuri is a 464KB Zig-native CDP (Chrome DevTools Protocol) broker that launches in ~3ms -- no full Chrome installation required for the runtime, though it does use your installed Chrome for browsing sessions.

Verify the installation:

unbrowse --version

Step 2: Browse a Website

Let's discover the shadow APIs behind Hacker News. Start a browsing session:

unbrowse go https://news.ycombinator.com

This opens a browser window pointed at Hacker News. While you browse, Unbrowse is doing several things silently in the background:

  1. HAR recording: Every HTTP request the browser makes is captured via Chrome DevTools Protocol
  2. Fetch/XHR interception: A JavaScript interceptor (INTERCEPTOR_SCRIPT) runs on the page, catching any async requests that HAR might miss on SPAs
  3. Cookie extraction: Your existing browser cookies are injected into the session for authenticated access

Browse the site normally. Click on some stories. Visit the comments page. Check your profile if you are logged in. The more pages you visit, the more API endpoints Unbrowse discovers.

Step 3: Close the Session and Trigger Discovery

When you are done browsing, close the browser window or run:

unbrowse snap

On close, Unbrowse runs the full enrichment pipeline on all captured traffic:

  1. extractEndpoints: Identifies API endpoints from captured requests. Filters out static assets (images, CSS, JS), tracking pixels, and CDN requests. Extracts the URL template, HTTP method, query parameters, request body schema, and response schema.

  2. extractAuthHeaders: Identifies authentication patterns. Looks for Authorization headers, API keys, cookies, and custom auth headers. Stores the credential type and extraction pattern.

  3. storeCredential: Saves discovered credentials securely in the local credential vault. Each credential is associated with a domain and can be reused across sessions.

  4. mergeEndpoints: Merges newly discovered endpoints with any existing routes for this domain. Deduplicates endpoints, updates schemas if the new capture has more detail, and preserves manually added metadata.

  5. generateLocalDescription: Generates human-readable descriptions for each endpoint based on the URL pattern, parameters, and response schema. For example: "Fetches the top stories from Hacker News sorted by score."

  6. augmentEndpointsWithAgent: Uses an LLM to add semantic metadata. The model analyzes the endpoint's URL, parameters, and sample response to generate: a natural language description, parameter explanations, use case examples, and related endpoints.

  7. buildSkillOperationGraph: Builds a dependency graph between endpoints. Some endpoints require data from other endpoints (for example, getting a user's submissions requires the username from the user profile endpoint). The graph encodes these relationships.

  8. cachePublishedSkill: Stores the complete skill (all endpoints for the domain) in the local cache.

  9. queueBackgroundIndex: Publishes the skill to the shared Unbrowse marketplace. Other users can now discover and use these endpoints without browsing the site themselves.

Step 4: View Discovered APIs

After the enrichment pipeline completes, view the discovered skill:

unbrowse skill news.ycombinator.com

This shows every API endpoint discovered from your browsing session. For Hacker News, you might see:

Domain: news.ycombinator.com
Endpoints: 8 discovered

GET /news
  Description: Fetches the front page stories
  Parameters: p (page number, optional)
  Auth: None required

GET /item?id={id}
  Description: Fetches a specific story/comment by ID
  Parameters: id (item ID, required)
  Auth: None required

GET /user?id={username}
  Description: Fetches a user profile
  Parameters: id (username, required)
  Auth: None required

GET /newcomments
  Description: Fetches the latest comments across all stories
  Parameters: None
  Auth: None required

...

Each endpoint includes the URL template, HTTP method, parameters with types and descriptions, authentication requirements, and a sample response.

Step 5: Execute an API Call

Now that the routes are discovered, you can call them directly:

unbrowse resolve "top stories on hacker news"

Unbrowse matches your natural language query to the best endpoint (in this case, GET /news), calls it directly (no browser), and returns the structured data.

For more specific queries:

unbrowse resolve "hacker news comments for item 12345"

This resolves to GET /item?id=12345 and returns the item with all its comments.

You can also execute endpoints directly if you know the exact route:

unbrowse execute news.ycombinator.com "GET /user?id=dang"

Step 6: Use via MCP in Your AI Agent

To make these routes available to AI agents, start the Unbrowse MCP server:

Add to your Claude Desktop config (claude_desktop_config.json):

{
  "mcpServers": {
    "unbrowse": {
      "command": "unbrowse",
      "args": ["mcp"]
    }
  }
}

Now Claude can use Unbrowse tools in conversation:

User: "What are the top stories on Hacker News right now?"

Claude: [calls unbrowse_resolve with query "top stories on hacker news"]

The response comes back in ~950ms with structured JSON -- no browser launch, no page rendering, no HTML parsing.

Advanced: Discovering APIs on Complex Sites

Hacker News is simple. Let's look at discovering APIs on more complex sites.

Authenticated Sites

For sites that require login (Reddit, Twitter, LinkedIn), Unbrowse automatically extracts cookies from your local browser's cookie database and injects them into the browsing session. This means:

  1. If you are logged into Reddit in your regular Chrome, Unbrowse can access your authenticated session
  2. Auth headers and cookies are captured and stored during the browsing session
  3. Future API calls use the stored credentials automatically
# Browse Reddit while logged in -- your cookies are injected automatically
unbrowse go https://reddit.com/r/programming

# After browsing, the authenticated API endpoints are available
unbrowse resolve "top posts in r/programming"

Single Page Applications (SPAs)

Modern SPAs (React, Vue, Next.js) make all their API calls client-side via fetch or XHR. Unbrowse captures these through two mechanisms:

  1. HAR recording via Chrome DevTools Protocol captures most requests
  2. JavaScript interceptor catches async requests that HAR may miss (particularly on SPAs that use service workers or custom fetch wrappers)

Both sources are merged during the enrichment pipeline, ensuring comprehensive API coverage.

GraphQL Endpoints

Some sites (Twitter/X, GitHub, Facebook) use GraphQL instead of REST. Unbrowse handles GraphQL by:

  1. Capturing the POST requests to the GraphQL endpoint
  2. Extracting the operationName from each request
  3. Treating each unique operation as a separate endpoint
  4. Storing the query/mutation and variables schema
# After browsing Twitter
unbrowse skill twitter.com

# You'll see GraphQL operations like:
# POST /i/api/graphql/{hash}/HomeTimeline
# POST /i/api/graphql/{hash}/UserByScreenName
# POST /i/api/graphql/{hash}/TweetDetail

How the Shared Marketplace Works

Every time you discover APIs on a new domain, the routes are published to the Unbrowse marketplace. Here is how the economics work:

  1. Discovery: You browse a site and Unbrowse indexes the API routes
  2. Publishing: Routes are published to the marketplace with your attribution
  3. Consumption: Other users or agents resolve queries against your discovered routes
  4. Earnings: You earn x402 micropayments when your routes are used

This creates a positive flywheel: more users means more domains indexed, which means higher resolve hit rates, which attracts more users. The marketplace currently has routes for 3,000+ domains.

Verifying Your Discovered Routes

After discovery, it is good practice to verify that the endpoints work:

# List all endpoints for a domain
unbrowse skill example.com

# Execute a specific endpoint to verify
unbrowse execute example.com "GET /api/products?q=test"

# Check if a natural language query resolves correctly
unbrowse resolve "products on example.com matching test"

If an endpoint returns an error, it might be because:

  • The authentication token has expired (re-browse to refresh)
  • The endpoint requires specific parameters you did not provide
  • The site has changed its API structure (re-browse to re-index)

Practical Examples

Example 1: E-commerce Price Monitoring

# Initial discovery
unbrowse go https://amazon.com
# Browse to a product, search for items, check prices
# Close the browser

# Now monitor prices without a browser
unbrowse resolve "price of MacBook Pro M4 on Amazon"
# Returns: structured JSON with price, availability, seller info

Example 2: Social Media Data Collection

# Initial discovery (logged in)
unbrowse go https://twitter.com
# Scroll your timeline, check profiles, read threads
# Close the browser

# Collect data programmatically
unbrowse resolve "latest tweets from @elonmusk"
# Returns: structured tweet data with text, engagement metrics, timestamps

Example 3: Research and Analysis

# Initial discovery
unbrowse go https://scholar.google.com
# Search for papers, browse results
# Close the browser

# Query the research API
unbrowse resolve "recent papers on large language models from Google Scholar"
# Returns: structured paper data with titles, authors, citations, abstracts

What Makes This Different From Scraping

Traditional scraping follows this path:

  1. Render the page in a browser
  2. Parse the HTML DOM
  3. Find elements using CSS selectors or XPath
  4. Extract text content
  5. Hope the site structure does not change

Shadow API discovery follows this path:

  1. Capture the API calls the browser makes
  2. Call the same APIs directly
  3. Get structured JSON responses
  4. The data is the same format the site itself uses

The difference: you are getting the data from the same source the website gets it from. No parsing, no selectors, no DOM traversal. If the website can show you the data, the API endpoint exists to serve that data -- and now you can call it directly.

Next Steps

  • Explore the marketplace: Run unbrowse search "your interest" to find pre-indexed domains
  • Set up MCP: Configure Unbrowse as an MCP server for your AI agent
  • Discover more domains: Browse your most-used sites to build your local route cache
  • Earn: Your discovered routes earn micropayments when other users consume them

Shadow API discovery is not a hack or a workaround. It is recognizing that modern websites are API-driven by design. Unbrowse simply makes those APIs accessible to everyone.