Blog
Shadow API Discovery: A Step-by-Step Tutorial
Learn how to discover hidden shadow APIs behind any website. A practical step-by-step tutorial using Unbrowse to find, document, and call internal API endpoints.
Every website you use daily -- Amazon, Reddit, Twitter, YouTube, LinkedIn -- is a thin frontend layer over a set of internal APIs. When you search for products on Amazon, your browser is not magically generating results: it is calling GET /s?k=laptop&ref=nb_sb_noss and rendering the JSON response. When you scroll your Reddit feed, the client is hitting GET /api/v1/subreddit/hot.json.
These internal endpoints are called shadow APIs. They are not documented. They are not meant for external use. But they contain the exact same data the website shows you -- in clean, structured JSON format.
Shadow API discovery is the process of finding these endpoints, understanding their schemas, and building callable interfaces from them. This tutorial walks through the entire process using Unbrowse, step by step.
Prerequisites
- Node.js 18+ installed
- A terminal (macOS, Linux, or WSL on Windows)
- 5 minutes of your time
Step 1: Install Unbrowse
npx unbrowse setup
This installs the Unbrowse CLI globally and sets up the Kuri browser runtime. Kuri is a 464KB Zig-native CDP (Chrome DevTools Protocol) broker that launches in ~3ms -- no full Chrome installation required for the runtime, though it does use your installed Chrome for browsing sessions.
Verify the installation:
unbrowse --version
Step 2: Browse a Website
Let's discover the shadow APIs behind Hacker News. Start a browsing session:
unbrowse go https://news.ycombinator.com
This opens a browser window pointed at Hacker News. While you browse, Unbrowse is doing several things silently in the background:
- HAR recording: Every HTTP request the browser makes is captured via Chrome DevTools Protocol
- Fetch/XHR interception: A JavaScript interceptor (
INTERCEPTOR_SCRIPT) runs on the page, catching any async requests that HAR might miss on SPAs - Cookie extraction: Your existing browser cookies are injected into the session for authenticated access
Browse the site normally. Click on some stories. Visit the comments page. Check your profile if you are logged in. The more pages you visit, the more API endpoints Unbrowse discovers.
Step 3: Close the Session and Trigger Discovery
When you are done browsing, close the browser window or run:
unbrowse snap
On close, Unbrowse runs the full enrichment pipeline on all captured traffic:
-
extractEndpoints: Identifies API endpoints from captured requests. Filters out static assets (images, CSS, JS), tracking pixels, and CDN requests. Extracts the URL template, HTTP method, query parameters, request body schema, and response schema. -
extractAuthHeaders: Identifies authentication patterns. Looks forAuthorizationheaders, API keys, cookies, and custom auth headers. Stores the credential type and extraction pattern. -
storeCredential: Saves discovered credentials securely in the local credential vault. Each credential is associated with a domain and can be reused across sessions. -
mergeEndpoints: Merges newly discovered endpoints with any existing routes for this domain. Deduplicates endpoints, updates schemas if the new capture has more detail, and preserves manually added metadata. -
generateLocalDescription: Generates human-readable descriptions for each endpoint based on the URL pattern, parameters, and response schema. For example: "Fetches the top stories from Hacker News sorted by score." -
augmentEndpointsWithAgent: Uses an LLM to add semantic metadata. The model analyzes the endpoint's URL, parameters, and sample response to generate: a natural language description, parameter explanations, use case examples, and related endpoints. -
buildSkillOperationGraph: Builds a dependency graph between endpoints. Some endpoints require data from other endpoints (for example, getting a user's submissions requires the username from the user profile endpoint). The graph encodes these relationships. -
cachePublishedSkill: Stores the complete skill (all endpoints for the domain) in the local cache. -
queueBackgroundIndex: Publishes the skill to the shared Unbrowse marketplace. Other users can now discover and use these endpoints without browsing the site themselves.
Step 4: View Discovered APIs
After the enrichment pipeline completes, view the discovered skill:
unbrowse skill news.ycombinator.com
This shows every API endpoint discovered from your browsing session. For Hacker News, you might see:
Domain: news.ycombinator.com
Endpoints: 8 discovered
GET /news
Description: Fetches the front page stories
Parameters: p (page number, optional)
Auth: None required
GET /item?id={id}
Description: Fetches a specific story/comment by ID
Parameters: id (item ID, required)
Auth: None required
GET /user?id={username}
Description: Fetches a user profile
Parameters: id (username, required)
Auth: None required
GET /newcomments
Description: Fetches the latest comments across all stories
Parameters: None
Auth: None required
...
Each endpoint includes the URL template, HTTP method, parameters with types and descriptions, authentication requirements, and a sample response.
Step 5: Execute an API Call
Now that the routes are discovered, you can call them directly:
unbrowse resolve "top stories on hacker news"
Unbrowse matches your natural language query to the best endpoint (in this case, GET /news), calls it directly (no browser), and returns the structured data.
For more specific queries:
unbrowse resolve "hacker news comments for item 12345"
This resolves to GET /item?id=12345 and returns the item with all its comments.
You can also execute endpoints directly if you know the exact route:
unbrowse execute news.ycombinator.com "GET /user?id=dang"
Step 6: Use via MCP in Your AI Agent
To make these routes available to AI agents, start the Unbrowse MCP server:
Add to your Claude Desktop config (claude_desktop_config.json):
{
"mcpServers": {
"unbrowse": {
"command": "unbrowse",
"args": ["mcp"]
}
}
}
Now Claude can use Unbrowse tools in conversation:
User: "What are the top stories on Hacker News right now?"
Claude: [calls unbrowse_resolve with query "top stories on hacker news"]
The response comes back in ~950ms with structured JSON -- no browser launch, no page rendering, no HTML parsing.
Advanced: Discovering APIs on Complex Sites
Hacker News is simple. Let's look at discovering APIs on more complex sites.
Authenticated Sites
For sites that require login (Reddit, Twitter, LinkedIn), Unbrowse automatically extracts cookies from your local browser's cookie database and injects them into the browsing session. This means:
- If you are logged into Reddit in your regular Chrome, Unbrowse can access your authenticated session
- Auth headers and cookies are captured and stored during the browsing session
- Future API calls use the stored credentials automatically
# Browse Reddit while logged in -- your cookies are injected automatically
unbrowse go https://reddit.com/r/programming
# After browsing, the authenticated API endpoints are available
unbrowse resolve "top posts in r/programming"
Single Page Applications (SPAs)
Modern SPAs (React, Vue, Next.js) make all their API calls client-side via fetch or XHR. Unbrowse captures these through two mechanisms:
- HAR recording via Chrome DevTools Protocol captures most requests
- JavaScript interceptor catches async requests that HAR may miss (particularly on SPAs that use service workers or custom fetch wrappers)
Both sources are merged during the enrichment pipeline, ensuring comprehensive API coverage.
GraphQL Endpoints
Some sites (Twitter/X, GitHub, Facebook) use GraphQL instead of REST. Unbrowse handles GraphQL by:
- Capturing the POST requests to the GraphQL endpoint
- Extracting the
operationNamefrom each request - Treating each unique operation as a separate endpoint
- Storing the query/mutation and variables schema
# After browsing Twitter
unbrowse skill twitter.com
# You'll see GraphQL operations like:
# POST /i/api/graphql/{hash}/HomeTimeline
# POST /i/api/graphql/{hash}/UserByScreenName
# POST /i/api/graphql/{hash}/TweetDetail
How the Shared Marketplace Works
Every time you discover APIs on a new domain, the routes are published to the Unbrowse marketplace. Here is how the economics work:
- Discovery: You browse a site and Unbrowse indexes the API routes
- Publishing: Routes are published to the marketplace with your attribution
- Consumption: Other users or agents resolve queries against your discovered routes
- Earnings: You earn x402 micropayments when your routes are used
This creates a positive flywheel: more users means more domains indexed, which means higher resolve hit rates, which attracts more users. The marketplace currently has routes for 3,000+ domains.
Verifying Your Discovered Routes
After discovery, it is good practice to verify that the endpoints work:
# List all endpoints for a domain
unbrowse skill example.com
# Execute a specific endpoint to verify
unbrowse execute example.com "GET /api/products?q=test"
# Check if a natural language query resolves correctly
unbrowse resolve "products on example.com matching test"
If an endpoint returns an error, it might be because:
- The authentication token has expired (re-browse to refresh)
- The endpoint requires specific parameters you did not provide
- The site has changed its API structure (re-browse to re-index)
Practical Examples
Example 1: E-commerce Price Monitoring
# Initial discovery
unbrowse go https://amazon.com
# Browse to a product, search for items, check prices
# Close the browser
# Now monitor prices without a browser
unbrowse resolve "price of MacBook Pro M4 on Amazon"
# Returns: structured JSON with price, availability, seller info
Example 2: Social Media Data Collection
# Initial discovery (logged in)
unbrowse go https://twitter.com
# Scroll your timeline, check profiles, read threads
# Close the browser
# Collect data programmatically
unbrowse resolve "latest tweets from @elonmusk"
# Returns: structured tweet data with text, engagement metrics, timestamps
Example 3: Research and Analysis
# Initial discovery
unbrowse go https://scholar.google.com
# Search for papers, browse results
# Close the browser
# Query the research API
unbrowse resolve "recent papers on large language models from Google Scholar"
# Returns: structured paper data with titles, authors, citations, abstracts
What Makes This Different From Scraping
Traditional scraping follows this path:
- Render the page in a browser
- Parse the HTML DOM
- Find elements using CSS selectors or XPath
- Extract text content
- Hope the site structure does not change
Shadow API discovery follows this path:
- Capture the API calls the browser makes
- Call the same APIs directly
- Get structured JSON responses
- The data is the same format the site itself uses
The difference: you are getting the data from the same source the website gets it from. No parsing, no selectors, no DOM traversal. If the website can show you the data, the API endpoint exists to serve that data -- and now you can call it directly.
Next Steps
- Explore the marketplace: Run
unbrowse search "your interest"to find pre-indexed domains - Set up MCP: Configure Unbrowse as an MCP server for your AI agent
- Discover more domains: Browse your most-used sites to build your local route cache
- Earn: Your discovered routes earn micropayments when other users consume them
Shadow API discovery is not a hack or a workaround. It is recognizing that modern websites are API-driven by design. Unbrowse simply makes those APIs accessible to everyone.