MODEL

OpenAI o3

*Successorship in the o-series pitched toward harder STEM and competition-style evaluations.*

Shipped by OpenAI · · class: reasoning-llm

Pre-release framing highlighted exceptional scores on select technical benchmarks—the kind of claim that reliably triggers both excitement and methodology disputes about training leakage and compute fairness. OpenAI paired numbers with cautious messaging about cost and latency.

Post-release observation suggests the same structural pattern as o1: unmistakable peaks on curated tasks with variable transfer to messy professional work. The economically relevant question is not whether o3 wins contests, but when additional inference spend buys reliability—and when it buys overthinking.

Tracked claims about this model