← Back to the live scorecard
Analysis · June 2026

Situational Awareness: A Two-Year Scorecard

Published June 10, 2026 · Updated as verdicts change
Summary: Two years after Situational Awareness predicted AGI by 2027, we graded its major predictions against mid-2026 evidence: 3 on track, 1 clearly wrong, 2 open, 2 too early to grade. The clearest miss — open-source models fading — has underappreciated implications for the essay's geopolitical core. We also pre-register what evidence would flip each verdict.
Scoreboard: 3 predictions on track, 1 wrong, 2 open, 2 pending — each prediction listed with its verdict

Why another retrospective

Two excellent evaluations already exist: Nathan Delisle's one-year quantitative audit (June 2025), which checked the OOM math driver-by-driver, and Jamie Harris's two-year evaluation on the EA Forum (March 2026), which covered the full breadth of claims.

Both are point-in-time snapshots. This analysis adds three things: (1) a verdict-level scorecard at the exact two-year mark (the essay was published June 2024), (2) explicit, pre-registered criteria for what would change each verdict, and (3) calibration context — where Aschenbrenner's timeline sits relative to other public forecasters.

Grading framework

A note on charity: where the essay's claim is ambiguous, we grade the reading a careful 2024 reader would have taken away, not the most defensible retreat available in hindsight. This choice does real work in the one "Wrong" verdict, flagged below.

The scorecard (June 2026)

PredictionEvidence as of mid-2026Verdict
Models outpace college graduates by 2025/26~83% on knowledge-work benchmarks (GPT-5.4, GDPval); ~80% SWE-Bench Pro (Claude Fable 5); agents in productionOn track
Effective compute scales ~0.5 OOM/yr ×2Delisle's audit: pace "roughly supported," launches scattered ±0.5 OOM around trendOn track
Capex acceleration toward trillion-dollar scaleInvestment running ahead of projections — his most clearly vindicated claimExceeded
Open source fades; proprietary US moatDeepSeek V4 / Qwen 3.7 Max at ~3–6 months behind frontier, fraction of the price, real architectural innovationWrong
AGI by 2027 (AI-researcher level)Agentic coding strong; autonomous end-to-end research undemonstrated; 18 months leftOpen
US govt launches AGI project by 27/28National-security involvement growing; no formal Project; deadline not elapsedOpen
Intelligence explosion 2027–29Pending
Superintelligence, 2030sPending

Two claims we're deliberately not grading as wins despite favorable headlines: AI revenue (tracking meaningfully behind his ~$100B-run-rate sketch — Harris's March 2026 evaluation put the most generous figure at ~$60B, while spring-2026 lab reporting has OpenAI at ~$25B+ and Anthropic at ~$19–20B annualized; behind on any accounting) and the "shocking qualitative leap" framing (benchmarks were met, but the felt discontinuity hasn't landed as described). Both cut against grade inflation in his favor.

The one clear miss, and why it matters more than it looks

The open-source verdict is the one we expect the most pushback on, so here is the reasoning in full.

The essay's geopolitical architecture rests on a specific premise: algorithmic secrets and model weights are the crown jewels; whoever locks them down converts compute concentration into durable national advantage. "Lock Down the Labs" is a third of the essay's policy program.

What happened instead: capable AI diffused. DeepSeek shipped genuine architectural advances (Epoch characterizes MLA and fine-grained MoE as real innovations, not just distillation), open-weight models compressed the frontier gap to months, and pricing collapsed. Distillation allegations are credible and muddy the attribution — but they strengthen rather than rescue the original claim, because a world where frontier capability can be cheaply distilled is precisely a world where the proprietary moat doesn't hold.

The downstream implication: if intelligence is cheap and diffuse, compute concentration buys less geopolitical advantage than the essay's model assumes, and the arms-race logic that motivates The Project weakens.

The steelman against this grade: Aschenbrenner's claim could be read narrowly — that the very frontier stays proprietary, which remains true. We grade against the broader reading because the essay's policy conclusions depend on the broader version. If you think that's miscalibrated, that's exactly the argument we want to have.

Calibration context: his 2027 vs everyone else

ForecasterAGI timelineNote
Elon Musk (xAI)By end of 2026Most aggressive public claim
Leopold Aschenbrenner2027 "strikingly plausible"The subject of this scorecard
Demis Hassabis (DeepMind)~50% by 2030Cautious lab leader
Samotsvety~28% by 2030Strongest forecasting track record
Metaculus community25% by 2029 · 50% by 2033Feb 2026, ~2,000 forecasters
Andrej Karpathy~A decade outArchitecture skeptic
AI researcher survey (n=2,778)50% by 2040Academic median

Definitions vary enough that this is approximate. The meta-observation: expert medians have compressed from roughly 2060 to roughly 2033 in six years. Aschenbrenner sits well inside the aggressive tail — but the whole distribution has been moving toward him.

Pre-registered: what would flip each verdict

Known weaknesses of this exercise

Verdict-level grading compresses real uncertainty; reasonable people can draw the On-track/Open line differently. The evidence base is public reporting and the two prior evaluations, not lab-internal data. And there's an unavoidable selection effect: the predictions easiest to grade are the most concrete ones, which biases any scorecard toward quantitative claims and away from the strategic ones that may matter more.

The live scorecard updates as models ship and verdicts change.

View the live scorecard →

Disclosure on process: research and drafting were substantially assisted by Claude, including the evidence-gathering behind each verdict. The framing, source review, and responsibility for the grades as published rest with the site's maintainer — if one is wrong, it gets changed, with a changelog.