How it works
News Pulse maps what Americans are actually paying attention to — not what editors decide to cover, and not just what's trending on one platform. Here's exactly how.
What we track
Every hour, News Pulse fetches live data from eight sources. Each source measures a different kind of attention:
- Google News — Algorithmic ranking of what news outlets are publishing and people are clicking. Position 1–20 in the top stories feed.
- Google Trends — Search volume for topics relative to peak interest. Reflects what people are actively looking up right now.
- YouTube — Top trending videos in the News & Politics category, U.S. region. Scored by views gained in the last 24 hours — not total lifetime views — so viral-then-stale videos drop out naturally.
- Wikipedia — Top-viewed articles from yesterday. When news breaks, people look it up immediately. Wikipedia traffic is one of the fastest signals that a story has genuine public interest.
- Polymarket — Prediction market volume on news-related events. Money bet in the last 24 hours reflects how much uncertainty and interest a story is generating.
- NPR News — AP Wire and NPR editorial coverage. Represents what professional journalists have determined is worth covering.
- NewsAPI — Coverage from hundreds of tracked news outlets — breadth of editorial coverage across the press.
- The Guardian — Guardian editorial coverage, weighted for independent international perspective.
- TV (CNN / Fox / MSNBC) — Broadcast news coverage. Contributes to editorial signal and reach data. TV headlines are too generic to use as topic labels, so TV never provides the headline — just the coverage signal.
Reddit and X/Twitter are not available — Reddit blocked all programmatic access in 2023, and the X API starts at $100+/month. Facebook's public data API was shut down in 2018. These are the biggest blind spots.
How stories are scored
Ranking uses three layers applied in sequence.
Layer 1 — Engagement score
Each engagement source (Google Trends, YouTube, Google News, Wikipedia, Polymarket) scores a topic 0–100 based on signal strength within that source. These scores are summed — not averaged. Two strongly engaged sources produce the highest possible base score. A weak second source barely moves the needle.
Google Trends, Wikipedia, and YouTube all use the same log scale: 100K views = ~52, 1M views = ~76, 10M = 100. This means signal strength is comparable across sources regardless of what else is trending that day.
Layer 2 — Editorial multiplier
When NPR, NewsAPI, Guardian, or TV are also covering the topic, a multiplier is applied on top of the engagement score — but only in proportion to how strong the engagement already is.
- Strong engagement + editorial coverage → up to +50% boost. The story is everywhere.
- Weak engagement + editorial coverage → minimal boost (~5%). Out there, but not resonating.
- Editorial only, no engagement → no boost, routed to "Editorial only" section.
Layer 3 — Sustained presence
Topics that appear across multiple hourly fetches get a +5% boost per fetch hour, capped at +60% (12 fetches). This rewards stories with staying power over one-off spikes without letting old stories crowd out breaking news permanently.
The editorial ratio bar
Every topic card shows a blue/orange bar. Blue is the share of coverage coming from editorial sources (journalists); orange is from engagement sources (public behavior).
- 1.0 (all blue) — only editorial sources are covering it
- 0.0 (all orange) — only public engagement signals, no press coverage
- Mixed — both layers active simultaneously
The same bar appears per angle when you expand a card — so you can see that the conflict angle might be 90% editorial while the human interest angle is 90% engagement.
What ⚡ and 📺 mean
- ⚡ Trending in search/social — not on TV or news
Strong public signal with no editorial coverage yet. The public noticed something journalism hasn't caught up to. These are worth watching — they often break into mainstream coverage within hours.
- 📺 In the news — not in our engagement signals
Editors are covering it but the public isn't searching or watching. Often policy stories, foreign affairs, or institutional news. These never appear in the main feed — they're surfaced separately so you can see what journalism is prioritizing that the public isn't amplifying.
How topics are grouped
Headlines are clustered automatically using named entity recognition (NER) and semantic embeddings. The goal is to group different sources covering the same story, not just stories that mention the same word.
- Each headline is run through spaCy NER to extract named entities (people, places, organizations, events).
- Headlines sharing a specific primary entity are grouped into a cluster.
- Headlines whose entities are too common to distinguish stories (e.g. "Trump," "U.S.") are clustered by semantic similarity using sentence embeddings.
- A co-reference pass merges clusters that reference each other's entities — so "U.S. strikes Iran" and "Iran responds to U.S." correctly land in one cluster.
- Claude Haiku names each cluster with a 1–4 word anchor label ("Iran Strikes," "SpaceX IPO," "Knicks").
How often it updates
The feed refreshes every hour automatically. The timestamp in the header shows when data was last fetched. The feed is cached for up to 1 hour — if you use the Refresh button within 10 minutes of loading, it will show "Just updated."
Archive data is retained for 2 years. Yesterday's feed and the full archive are accessible via the footer.