Blog
How to Access Reddit Data Without Scraping
Reddit killed free API access with $100/month minimum pricing. Learn how to discover Reddit's internal shadow APIs from browsing traffic using Unbrowse — no scraping, no paid API keys.
How to Access Reddit Data Without Scraping
Reddit's API pricing change in 2023 killed an entire ecosystem of tools overnight. What used to be free now costs $100/month minimum — and the free tier is so restrictive it's unusable for anything beyond hobby projects. If you're building an AI agent, a research tool, or any application that needs Reddit data at scale, you're stuck choosing between expensive API access, fragile HTML scraping, or giving up entirely.
There's a third option nobody talks about: the internal APIs Reddit's own frontend uses.
The Problem with Reddit Data Access in 2026
Reddit's official API (v1) was effectively free until June 2023. Then Reddit announced pricing that would cost third-party apps millions per year. Apollo, RIF, and dozens of other apps shut down.
The current state:
- Official API: $100/month minimum (Enterprise tier). Free tier limited to 100 requests/minute with heavy restrictions on commercial use.
- HTML scraping: Reddit aggressively blocks scrapers. CloudFlare protection, fingerprinting, and rate limiting make traditional scraping unreliable. Old Reddit is increasingly degraded.
- Third-party SERP proxies: $50-200/month for Reddit-specific data, with no guarantee of freshness or completeness.
- Pushshift/Arctic Shift: Historical archives only. No real-time data. Pushshift access was restricted in 2023.
For AI agents that need to search Reddit, read threads, or monitor subreddits, none of these options work well.
Shadow APIs: The Alternative
Every time you visit reddit.com, your browser makes dozens of API calls behind the scenes. These internal endpoints return clean JSON data — the same data that renders the page you see. Reddit's new frontend (sh.reddit.com and the React-based new.reddit.com) is an SPA that fetches everything via internal API calls.
These shadow APIs:
- Return structured JSON, not HTML
- Are authenticated via your existing Reddit session cookies
- Have no published rate limits (they're designed for real user browsing)
- Cover every feature: search, feeds, comments, user profiles, moderation queues
Unbrowse captures these shadow APIs automatically from real browsing sessions.
What Unbrowse Discovers on Reddit
When you browse Reddit through Unbrowse, it intercepts and indexes the internal API calls your browser makes. Here are the key endpoints it discovers:
- Subreddit Feed:
GET /svc/shreddit/feeds/sd-home-feed?...— returns paginated post listings with titles, scores, comment counts, awards, and preview media. Clean JSON with cursor-based pagination. - Search:
GET /svc/shreddit/search?q=...&type=link— structured search results across posts, comments, communities, and users. Supports sort, time range, and subreddit scoping. - Post Detail + Comments:
GET /svc/shreddit/t3_{id}/comments— full comment tree with nested replies, scores, author flair, and awards. Returns the same data the comment section renders. - User Profile:
GET /svc/shreddit/user/{username}/overview— post and comment history, karma breakdown, account age. - Subreddit Sidebar/About:
GET /svc/shreddit/r/{subreddit}/about— community metadata, rules, subscriber count, description, moderators.
These endpoints return significantly more data than the official API's free tier allows, and they update in real-time.
How It Works
npm install -g unbrowse
unbrowse resolve "search for AI agent frameworks" --url https://reddit.com
Here's what happens under the hood:
- Browse: Unbrowse opens Reddit in a real browser (Kuri) with your existing session cookies injected automatically from your local Chrome or Firefox profile.
- Capture: As the page loads and you interact, Unbrowse's HAR recorder and fetch/XHR interceptor capture every API call the frontend makes.
- Index: Captured endpoints go through the enrichment pipeline — URL template extraction, auth header detection, parameter inference, and LLM-powered semantic labeling.
- Cache: Indexed routes are stored locally. Future requests for the same intent hit the cached API directly — no browser needed.
- Execute: On subsequent calls, Unbrowse calls Reddit's internal APIs directly with your session auth, returning structured JSON in under 100ms.
The first request takes a few seconds (browser has to load the page). Every request after that is a direct API call.
Performance
| Metric | Browser Automation (Playwright) | Official Reddit API | Unbrowse (cached) |
|---|---|---|---|
| Speed | ~3,400ms | ~200ms | <100ms |
| Tokens consumed | ~8,000 | ~500 | ~200 |
| Cost per action | $0.53 | $0.003 + API fee | $0.005 |
| Auth required | Session cookies | OAuth app + API key | Your browser cookies |
| Rate limit | Browser resources | 100 req/min (free) | Browser session limits |
| Monthly cost | Compute only | $100+ | Free (open source) |
Unbrowse is 3.6x faster than Playwright across 94 tested domains, and on Reddit specifically, cached API calls complete in under 100ms compared to 3-4 seconds for full page loads.
When to Use This Approach
AI agent research: Your agent needs to search Reddit for user opinions, product reviews, or discussions about a topic. Instead of scraping HTML or paying for API access, Unbrowse gives the agent direct API access to Reddit's search and feed endpoints.
Subreddit monitoring: Track new posts in specific subreddits for market research, brand monitoring, or content curation. The feed endpoints support pagination and sorting, making it easy to poll for new content.
Sentiment analysis pipelines: Pull comment trees for specific posts or search results to feed into NLP pipelines. The structured JSON from shadow APIs is already parsed — no HTML cleanup needed.
Competitive intelligence: Monitor what users say about your product or competitors across relevant subreddits. Combine search endpoints with comment tree endpoints for deep thread analysis.
Getting Started
# Install Unbrowse globally
npm install -g unbrowse
# Run initial setup (configures Kuri browser, detects local Chrome cookies)
unbrowse setup
# Discover Reddit's internal APIs by browsing
unbrowse resolve "search for machine learning projects" --url https://reddit.com
# Future calls use cached API routes directly
unbrowse resolve "search for best Python frameworks" --url https://reddit.com
Unbrowse is open source and published on arXiv. It works as an MCP server for Claude, as a LangChain tool, or as a standalone CLI.
FAQ
Is this legal? Unbrowse uses your authenticated browser session and respects robots.txt. It accesses the same APIs your browser does when you visit Reddit normally. You're not circumventing any access controls — you're using your own session.
How is this different from scraping? Scraping parses HTML. Unbrowse discovers and calls the actual APIs — the same ones Reddit's own frontend uses. The data comes back as structured JSON, not messy HTML that breaks when Reddit changes their layout.
Will this break when Reddit updates their site? Shadow APIs are more stable than HTML layouts. Reddit's frontend team builds against these APIs — breaking them would break reddit.com itself. When endpoints do change, Unbrowse re-discovers them automatically on the next browse session.
What about Reddit's terms of service? Unbrowse accesses Reddit as a normal authenticated user through a real browser. It doesn't bypass authentication, doesn't use fake accounts, and doesn't exceed what a normal browsing session would generate. The API calls are identical to what your browser makes when you visit reddit.com.
Can I use this with my AI agent? Yes. Unbrowse works as an MCP server that any Claude, GPT, or LangChain agent can call. Your agent says "search Reddit for X" and gets structured JSON back — no browser automation code needed.