Mining Guide
The 50 Most Valuable Domains to Mine (And Why)
A practical guide to the highest-value websites for AI agent route discovery
Not all domains are equal for mining. A single Shopify storefront yields routes that apply to millions of stores. One Google News session replaces mining dozens of individual publishers. And a hard-to-mine site like LinkedIn produces routes worth 10x more than an easy site, because fewer contributors can capture them.
We benchmarked 94 domains in our paper Internal APIs Are All You Need (arXiv: 2604.00694). 61 out of 94 had no bot detection at all. Even the WAF-protected sites yielded a 2.1x speedup over headless browsers. This is the practical guide to the 50 highest-value targets.
From the benchmark
How to read this guide
Each domain entry includes four fields:
- Why it’s valuable — what agents need from this site and why the routes earn.
- Difficulty — Easy means no bot detection and no auth required. Medium means auth is needed or there is light bot detection. Hard means aggressive bot detection, complex auth, or both.
- Expected routes — how many distinct API endpoints you will typically discover in a thorough mining session.
- Mining tip — specific advice for maximizing route yield on this domain.
Domains are grouped into 10 categories. Within each category, they are ordered by a combination of agent query volume and route value. Start with the category that matches the agents you build or serve.
Category 1
Search Engines
Every agent searches. Search is the most common first step in any agent workflow — finding information, verifying facts, discovering resources. These domains generate the highest resolve volume in the network because search is the universal entry point.
google.com
Hard~15-25 routesThe default search for every agent. Autocomplete, search results, Knowledge Graph panels, and People Also Ask boxes all run through internal JSON endpoints. Agents that can call these directly skip the entire rendered SERP.
bing.com
Medium~12-18 routesMicrosoft's search powers Copilot and many enterprise agents. The internal API returns structured search results, instant answers, and entity cards. Lower bot detection than Google.
duckduckgo.com
Easy~8-12 routesThe privacy-first search engine has minimal bot detection and clean JSON APIs. The instant answer API is already semi-public. Agents that need search without tracking constraints route here.
search.brave.com
Easy~6-10 routesBrave Search has its own independent index (not a Bing wrapper). The internal endpoints return structured results with a summarizer. Growing agent adoption because of its clean API surface.
you.com
Easy~8-14 routesAI-native search engine built for agents. The internal APIs power their AI chat, code search, and web search. Clean endpoints, minimal detection. The closest thing to a search API without paying for one.
Category 2
Code Platforms
Developer agents live on these platforms. Code search, repository browsing, package lookup, and dependency analysis are core workflows for any coding agent. These domains have high route density because developer tools expose many granular API endpoints.
github.com
Medium~30-50 routesThe center of developer gravity. Repository search, file browsing, issue tracking, PR reviews, Actions status — all backed by internal APIs that return richer data than the public REST API. Agent volume here is enormous.
gitlab.com
Medium~20-35 routesThe self-hosted alternative powers millions of enterprise repos. Internal APIs cover merge requests, CI pipelines, and container registries. Many enterprise agents route here because their orgs use GitLab.
npmjs.com
Easy~10-15 routesEvery JavaScript agent needs package metadata. The internal search, version history, and dependency tree endpoints return structured data that is richer than the public registry API.
pypi.org
Easy~8-12 routesPython's package index. Agents building Python projects need package search, version compatibility, and dependency resolution. The internal endpoints power the search and package detail pages.
stackoverflow.com
Easy~15-25 routesThe canonical Q&A site for developers. Internal APIs serve question search, answer ranking, code snippets, and related questions. Coding agents query StackOverflow more than any other reference site.
Category 4
E-commerce
Shopping agents, price comparison tools, and product research agents all need e-commerce data. These domains have high route density because every product page, search result, and filter combination hits separate API endpoints. The commercial value per route is among the highest in the index.
amazon.com
Hard~25-40 routesThe world's largest product catalog. Search, product details, pricing, reviews, and recommendations all use internal APIs. Any agent that does product research or price comparison needs Amazon routes.
ebay.com
Medium~15-20 routesAuction and fixed-price listings. The internal search, item detail, and bidding status APIs return structured data that the Findining API (public) does not fully cover. Agents doing price research need both Amazon and eBay routes.
shopify.com (storefronts)
Easy~8-12 routesMillions of stores run on Shopify. Every Shopify storefront uses the same internal API pattern (/products.json, /collections.json, /cart.js). Mine one Shopify store and the routes apply to all of them.
walmart.com
Medium~15-25 routesSecond-largest US retailer. Product search, inventory checking, store availability, and pricing all use internal APIs. Agents doing price comparison across retailers need Walmart routes alongside Amazon.
etsy.com
Easy~10-18 routesHandmade and vintage marketplace. Search, listing details, shop profiles, and reviews use clean internal APIs. Agents helping with gift finding, custom orders, or niche product discovery route here.
Category 5
Productivity
Workflow agents that manage tasks, documents, and team communication are among the fastest-growing agent categories. These domains require authentication, which makes their routes more valuable — authenticated routes are harder to discover and serve fewer contributors, meaning higher earnings per route.
notion.so
Medium~20-30 routesThe default workspace for startups and knowledge workers. Page content, database queries, search, and block manipulation all use internal APIs that are richer than the public Notion API. Agents managing knowledge bases route here constantly.
linear.app
Medium~12-20 routesIssue tracking for engineering teams. Issue search, project boards, cycle views, and team workloads use clean GraphQL endpoints. Agents managing sprints and tickets need Linear routes.
docs.google.com
Hard~20-35 routesGoogle Docs, Sheets, and Slides power collaborative work. The internal APIs handle real-time editing, comment threads, revision history, and sharing permissions. Agents that read or modify documents need these routes.
slack.com
Hard~15-25 routesTeam messaging. Channel history, search, user presence, and thread replies use internal APIs. Agents that monitor channels, summarize conversations, or post updates need Slack routes.
airtable.com
Medium~12-18 routesFlexible database for teams. Table views, record CRUD, form submissions, and automations use internal APIs. Agents managing structured data (CRM, inventory, project tracking) route here.
Category 6
Travel & Booking
Planning agents are a high-growth category. Travel sites have extremely high commercial value per route because every search involves pricing, availability, and booking — data that agents need in structured form. These sites also have heavy bot detection, which means fewer contributors and higher earnings.
airbnb.com
Medium~15-25 routesAccommodation search, listing details, pricing calendars, and availability checks all use internal APIs. Agents planning trips need Airbnb routes for price comparison and availability checking.
google.com/travel
Hard~12-20 routesGoogle Flights and Hotels aggregate pricing across all carriers and booking platforms. The internal APIs return the same comparison data that Google shows in the UI — structured fares, hotel rates, and availability.
booking.com
Medium~15-22 routesThe world's largest hotel booking platform. Hotel search, room availability, pricing, and review summaries use internal APIs. Agents comparing accommodation options need both Airbnb and Booking.com routes.
kayak.com
Medium~10-18 routesMeta-search for flights, hotels, and car rentals. The internal APIs aggregate results from dozens of providers and return normalized comparison data. Agents doing multi-provider price comparison route here.
tripadvisor.com
Easy~12-20 routesReviews, ratings, and recommendations for restaurants, hotels, and attractions. The internal APIs return structured review data, aggregate ratings, and ranked recommendations. Agents doing travel research route here for review intelligence.
Category 7
Finance
Financial data is among the most commercially valuable on the web. Market data, payment processing, and banking APIs command premium pricing in the route graph because the agents that use them are doing high-value work — trading, accounting, and financial planning.
stripe.com/dashboard
Hard~20-35 routesPayment processing for internet businesses. The dashboard's internal APIs expose payment history, subscription management, customer data, and analytics. Agents managing billing and revenue operations need these routes.
finance.yahoo.com
Easy~15-25 routesThe most widely used free financial data source. Stock quotes, historical prices, financial statements, and analyst estimates all use internal APIs that return structured market data. Every financial agent routes here.
coinmarketcap.com
Easy~12-18 routesThe reference site for cryptocurrency market data. Token prices, market caps, volume, and exchange data use internal APIs. Crypto trading agents and portfolio trackers route here for real-time data.
tradingview.com
Medium~15-22 routesCharting and technical analysis. The internal APIs serve real-time price data, indicator calculations, and screener results. Agents doing technical analysis or market screening route here for data that is not available through free market data APIs.
wise.com
Medium~8-14 routesInternational money transfers. Exchange rate quotes, transfer fee calculations, and corridor availability use internal APIs. Agents helping with cross-border payments and forex comparison need these routes.
Category 8
Infrastructure
DevOps agents manage deployments, monitor services, and configure infrastructure. These platforms are mostly dashboard-based, meaning their internal APIs are rich and well-structured. Authenticated routes from infrastructure platforms are high-value because they enable agents to manage production systems.
console.aws.amazon.com
Hard~30-50+ routesAWS powers a third of the cloud. The console's internal APIs cover service status, resource management, billing, and CloudWatch metrics. DevOps agents managing AWS infrastructure need these routes.
vercel.com
Medium~15-22 routesThe deployment platform for frontend projects. Build status, deployment logs, analytics, and domain management use clean internal APIs. Agents managing Vercel deployments route here for status checks and configuration.
dash.cloudflare.com
Medium~20-30 routesCDN, DNS, and security. The dashboard APIs manage DNS records, WAF rules, analytics, and Workers deployment. Agents managing web infrastructure route here.
app.netlify.com
Easy~10-15 routesAlternative deployment platform. Build logs, deploy previews, form submissions, and serverless function logs use internal APIs. Agents managing Netlify projects need these routes alongside Vercel routes.
app.datadog.com
Hard~20-30 routesMonitoring and observability. The dashboard APIs serve metric queries, log searches, alert status, and APM traces. Agents doing incident response and performance monitoring route here.
Category 9
Knowledge
Research agents need structured knowledge. These domains are typically the easiest to mine — knowledge platforms tend to have minimal bot detection and return well-structured data. They are high-volume routes because every research task starts with a knowledge lookup.
en.wikipedia.org
Easy~10-15 routesThe world's encyclopedia. Article content, search, category listings, and citation data use the MediaWiki API. Every research agent queries Wikipedia. The internal endpoints return structured content that is cleaner than parsing HTML.
arxiv.org
Easy~6-10 routesThe pre-print server for scientific papers. Search, paper metadata, abstract retrieval, and PDF links use internal APIs. Research agents working with academic papers route here. The API returns structured metadata including author lists, categories, and citation counts.
developer.mozilla.org
Easy~8-12 routesMDN Web Docs — the canonical reference for web standards. The internal search and document APIs return structured technical documentation. Coding agents look up MDN constantly for HTML, CSS, and JavaScript reference.
docs.python.org
Easy~5-8 routesPython's official documentation. The internal search and page content endpoints return structured reference material. Python coding agents query this for standard library documentation, function signatures, and usage examples.
devdocs.io
Easy~8-12 routesAggregated documentation for 100+ languages and frameworks. The internal API serves documentation content, search results, and framework-specific reference. Agents that work across multiple languages use DevDocs as a single lookup point.
Category 10
News
Monitoring agents track news for market signals, competitive intelligence, and trend detection. News domains have moderate route counts but high query volume — agents monitoring multiple topics make repeated calls throughout the day. Fresh news routes earn more because the demand is time-sensitive.
techcrunch.com
Easy~8-12 routesThe default news source for tech and startups. Article feeds, search, and category pages use internal APIs. Agents monitoring startup launches, funding rounds, and tech trends route here.
bloomberg.com
Hard~12-18 routesFinancial news and market analysis. The internal APIs serve article content, market data widgets, and economic indicators. Agents doing financial research and market monitoring route here for premium analysis.
reuters.com
Medium~10-15 routesWire service powering global news. Article feeds, topic search, and market data use internal APIs. Agents monitoring global events and market-moving news route here for speed — Reuters publishes faster than most outlets.
news.google.com
Hard~10-15 routesAggregated news from all sources. The internal APIs return article clusters, topic feeds, and personalized recommendations across publishers. Agents that need multi-source news monitoring route here instead of mining individual publishers.
apnews.com
Easy~8-12 routesAssociated Press — the wire service that feeds every major outlet. Clean internal APIs for article feeds, topic search, and breaking news. Lower bot detection than Bloomberg or Reuters, making it accessible for early contributors.
Get started in 60 seconds
Every domain you mine earns when agents use the routes. The first person to index a domain earns a 2x reward multiplier for 30 days. Many of the domains on this list have not been indexed yet. The earlier you mine them, the more you earn.
For the full economics of mining, including earning projections and the x402 micropayment flow, read Proof of Indexing: The Economics of Mining the Agentic Web.
Mining strategy
Start with Easy domains. Your first mining sessions should target Easy-rated domains like Hacker News, DuckDuckGo, npm, and Wikipedia. These sites have no bot detection, require no authentication, and produce clean routes in a single browse session. Build your route portfolio with reliable wins before tackling harder targets.
Hard domains pay more. Sites with aggressive bot detection (Google, LinkedIn, X, Bloomberg) have fewer contributors competing for their routes. If you can successfully mine them using cookie injection from your real browser sessions, your routes earn disproportionately more because supply is constrained while demand is high.
Think in categories, not individual sites. A Shopify storefront route works on millions of stores. A WordPress REST API route works on 40% of the web. When you mine a platform, you are mining every site built on that platform. Prioritize platforms over individual websites.
Authenticated routes are premium routes. Routes that require login (Notion, Slack, Stripe, AWS) are inherently more valuable because they enable agents to do things that unauthenticated routes cannot. They also have fewer contributors. If you already use these services, mining them while you work is free money.
Keep routes fresh. Websites change their internal APIs. A route that worked last month might return errors today. The route graph automatically deprioritizes stale routes. Re-mine your highest-value domains periodically to keep your routes at the top of the quality ranking.
Category 3
Social & Content
Research agents, monitoring agents, and content analysis agents all need social platform data. These sites have the highest route value because social data is hard to get through official APIs (rate limits, cost, approval processes) but flows freely through internal endpoints.
reddit.com
Medium~20-35 routesThe front page of the internet for agents. Subreddit search, post feeds, comment threads, and user profiles all use internal JSON APIs. Reddit killed their free public API — internal routes are now the primary access path for agents.
news.ycombinator.com
Easy~8-12 routesHacker News is the signal source for tech trends, startup launches, and developer sentiment. The Algolia-powered search API is already accessible, but the internal HN endpoints for stories, comments, and user karma return data the Algolia API misses.
x.com
Hard~25-40 routesTwitter/X is the real-time pulse of the internet. Timeline, search, trends, user profiles, and spaces all use internal GraphQL endpoints. Since the API became paid ($100/month minimum), internal routes are the only free path for agents.
linkedin.com
Hard~20-30 routesProfessional network data — job listings, company profiles, people search, and industry insights. LinkedIn's official API is extremely restricted. Internal routes are the only way agents can access the full dataset.
youtube.com
Medium~15-25 routesVideo search, channel data, transcript extraction, and recommendation feeds all use the InnerTube API (youtubei/v1). Agents need YouTube for research, content analysis, and tutorial discovery.