A SITE THAT KEEPS SCORE
AI Hype Tracker.
Longitudinal claim tracking, a transparent weekly hype index, and a daily digest from a deliberately balanced source diet.
TRACKED CLAIM · MOST ACTIVE
DIGEST · SUNDAY, MAY 17
read full →5 items today. 3 tracked claims updated.
RECENT ESSAYS · 35 IN TOTAL
all essays →Hype vs Reality
GPT-5 release: capability deltas vs the narrative
Measured comparison of what shipped against the pre-release framing — and why the "phase transition" rhetoric mostly didn't survive contact with the benchmarks.
Model Comparisons
Reasoning models — o1 → o3 → DeepSeek R1 → Claude Opus 4.x thinking
What's actually new in the reasoning-model wave, where the capability ceilings sit, and which benchmarks are starting to get gamed.
Hype vs Reality
Agentic coding: Cursor, Devin, Claude Code, Replit Agent — adoption data vs marketing decks
Where the published adoption metrics actually land for each agentic coding product, and what gets quietly conflated when vendors talk "AI software engineer."
Industry & Investment
The DeepSeek pressure: have inference prices actually collapsed?
Three months after the price-war narrative crystallized, what's happened to enterprise inference economics — and what the frontier labs' price-card revisions actually reveal.
Technical Deep Dives
SWE-bench is broken: how coding evals get gamed and what replaces them
How the canonical agentic-coding benchmark is being optimized against, the Anthropic eval-paper findings, and what credible coding-eval looks like from 2026 onward.
Hype vs Reality
AI productivity papers: Goldman, MIT, BCG — what they actually show and don't
The three most-cited 2024-2026 papers on AI productivity contribution, the methodological caveats their summaries skip, and what would constitute durable productivity evidence.
COMPANIES · 10
Anthropic · Cohere · Cursor · DeepSeek · Google DeepMind · Meta AI (FAIR) · Mistral AI · OpenAI · Perplexity · xAI
MODELS · 15
Claude 3.5 Sonnet · Claude Opus 4 (x) · Command R+ · DeepSeek R1 · DeepSeek V3 · Gemini 2.x · Gemma 2 · GPT-4 · GPT-5 · Grok 3 · Llama 3 · Llama 4 · Mistral Large · OpenAI o1 · OpenAI o3