Blog

How to Access HackerNews Data in Real-Time

HN's Firebase API is slow and limited. Learn how to discover Hacker News's internal endpoints for stories, comments, and search using Unbrowse — real-time data without polling delays.

Lewis Tham
April 3, 2026

How to Access HackerNews Data in Real-Time

Hacker News is one of the most important sources of signal for the tech industry. New launches get traction here first. Developer sentiment shows up in comments before anywhere else. Hiring trends, technology shifts, and startup funding news all surface on HN hours or days before mainstream coverage.

But accessing HN data programmatically is surprisingly painful. The official API is slow, limited, and designed for a different era.

The Problem with HN Data Access

Hacker News has a public API, but it has significant limitations:

  • Official Firebase API: HN's official API (hacker-news.firebaseio.com) provides individual items by ID. To get the front page, you fetch the top story IDs, then make a separate request for each story. That's 30+ sequential HTTP requests for one page of stories. Each request takes 100-300ms. Loading a full front page with metadata takes 5-10 seconds.
  • No search endpoint: The official API has no search. You can fetch items by ID or get lists of IDs (top, new, best, ask, show, job stories). Finding stories about a specific topic requires fetching thousands of items and filtering client-side.
  • Algolia HN Search: hn.algolia.com provides search, but it's a separate service with its own rate limits (10,000 requests/hour) and indexing delays. New stories can take minutes to appear in search results.
  • No comment threading: The API returns individual items. Building a comment tree requires recursive fetching — a story with 200 comments means 200+ API calls.
  • BigQuery dataset: Google maintains an HN dataset on BigQuery, but it's updated daily (not real-time) and requires a GCP account.

For AI agents that need to monitor HN, search for discussions, or analyze comment sentiment, the official API's one-item-at-a-time design makes it impractical.

Shadow APIs: The Alternative

Every time you visit news.ycombinator.com, your browser loads the page server-side. But HN's companion sites and search interfaces make internal API calls that return structured data. More importantly, services like hn.algolia.com (the default HN search) and HN's own newer endpoints return rich JSON that the official Firebase API doesn't expose.

Unbrowse captures these endpoints automatically from real browsing sessions, indexing both the HN site itself and the search/aggregation services the HN ecosystem uses.

What Unbrowse Discovers on Hacker News

Browsing the HN ecosystem through Unbrowse reveals faster, richer endpoints:

  • Front Page (batched): GET /news?p=1 rendered server-side, but Unbrowse extracts the structured data from the page in a single request — all 30 stories with titles, scores, comment counts, authors, timestamps, and URLs. One request instead of 31.
  • Search (Algolia): GET /api/v1/search?query={term}&tags=story via hn.algolia.com — full-text search across all HN stories and comments with relevance ranking, date filtering, and point thresholds. Returns structured JSON with story metadata, highlights, and comment counts.
  • Comment Trees: GET /api/v1/items/{id} via hn.algolia.com — full comment tree for a story in a single request. All comments with nesting, authors, timestamps, and scores. One call replaces hundreds of Firebase API calls.
  • User Activity: GET /api/v1/search?tags=author_{username} — all submissions and comments by a user with full metadata. Useful for analyzing posting patterns and topic interests.
  • Trending by Topic: GET /api/v1/search?query={topic}&numericFilters=points>50&hitsPerPage=50 — high-signal stories on any topic, filtered by minimum score. The numeric filters let you separate signal from noise.

How It Works

npm install -g unbrowse
unbrowse resolve "search for AI agent frameworks" --url https://news.ycombinator.com

The process:

  1. Browse: Unbrowse opens HN in Kuri. As you navigate (front page, search, story pages), all data-fetching endpoints are captured.
  2. Capture: Both the HN site endpoints and Algolia search API calls are intercepted. The fetch interceptor catches the search/API calls that power HN's search interface.
  3. Index: Endpoints are mapped to intents: front page retrieval, search, comment tree loading, user profiles. URL templates and parameter patterns are extracted.
  4. Cache: Routes are stored with their patterns. The Algolia search endpoints don't require authentication for public data.
  5. Execute: Future requests call the indexed endpoints directly. Front page data in one call. Full comment trees in one call. Search results with filtering in one call.

Performance

Metric HN Firebase API Algolia HN Search Unbrowse (cached)
Front page load ~5,000ms (31 calls) N/A <60ms (1 call)
Search Not available ~150ms <50ms
Comment tree (200 comments) ~30,000ms (200+ calls) ~200ms <80ms
Tokens consumed ~3,000 ~500 ~150
Rate limit None published 10,000/hour Cached

The difference is dramatic for comment trees. The official API requires one request per comment — a popular story with 500 comments means 500 sequential API calls taking 1-2 minutes. Unbrowse fetches the entire tree in a single cached call under 100ms.

When to Use This Approach

Launch monitoring: Your AI agent monitors HN for mentions of your product, competitors, or technology stack. Instead of polling the Firebase API (slow) or waiting for Algolia indexing (delayed), Unbrowse provides near-real-time search access.

Sentiment analysis: Pull complete comment trees for trending stories to analyze developer sentiment about new technologies, frameworks, or industry events. One API call gets the full discussion thread.

Content curation: Build automated newsletters or feeds that surface the highest-quality HN discussions on specific topics. The search endpoint with score filtering separates signal from noise.

Competitive intelligence: Monitor what the developer community says about products and technologies in your space. Search + comment tree access gives you full discussion context, not just headlines.

Getting Started

# Install Unbrowse globally
npm install -g unbrowse

# Run initial setup
unbrowse setup

# Search HN discussions
unbrowse resolve "search for MCP server discussions" --url https://news.ycombinator.com

# Get full comment threads
unbrowse resolve "get comments for top story" --url https://news.ycombinator.com

Unbrowse is open source and published on arXiv. It works as an MCP server for AI agents.

FAQ

Is this legal? Hacker News data is publicly accessible. The official API is public. Algolia's HN search API is public. Unbrowse accesses the same endpoints your browser uses when you visit news.ycombinator.com.

How is this different from scraping? Scraping parses HN's HTML (which is simple but requires parsing). Unbrowse captures and calls the API endpoints — both HN's server-rendered data extraction and the Algolia search API. Structured JSON responses.

Is HN data real-time? The front page and story endpoints are real-time (they serve what's currently on HN). Search via Algolia has a slight indexing delay (usually under a minute for new stories). Unbrowse's cached routes call these endpoints directly for the freshest data available.

Can I get historical HN data? Algolia's HN search index goes back to 2006. Unbrowse caches the search endpoint, which supports date range filtering. You can search historical stories and comments by combining query terms with date filters.

How do I handle pagination? Both the front page and search endpoints support pagination. Unbrowse captures the pagination patterns (page parameter for front page, hitsPerPage/page for search) and exposes them in the cached route.