Frontier AI models are more powerful than ever, but new research suggests some of the hype around autonomous AI may be getting ahead of reality.
General Reasoning, an AI research firm, released KellyBench this week, a long-horizon test that places AI agents inside a simulated English Premier League betting market and asks them to grow a bankroll over a full season.
The results were not flattering.
Every model lost money. Claude did best, finishing down just 11%, but that was still a loss. Grok 4.20 fared worst, burning through nearly 90% of its bankroll. xAI, Elon’s company behind Grok, has experienced heavy leadership turnover and scaling challenges in its attempt to catch up with the leading models.
The firm rated each model on a 44-point sophistication rubric developed with quantitative betting experts.
No model scored higher than a third of available points. “Models struggle to behave coherently over long time horizons,” the researchers wrote, “often failing to act upon their analysis or failing to adapt as the world changes.”
That gap between hype and reality is already moving markets. Nearly 80,000 tech workers were laid off in the first quarter of 2026 alone, with almost half of those cuts attributed to AI.
Companies from Amazon (NASDAQ:AMZN) to Block (NYSE:SQ) to Meta (NASDAQ:META) have cited AI efficiency as justification for headcount reductions. KellyBench suggests those claims may be running ahead of what the technology can actually deliver.
The Citrini scenario holds that AI agents will rapidly displace white-collar workers, triggering a credit and deflationary spiral. KellyBench may give that thesis pause. If frontier models can’t yet beat a football betting market, the timeline for the kind of autonomous financial decision-making the scenario requires may be longer than many assume.
On Kalshi, traders currently price the Citrini scenario at around 23%, a market that has attracted over $25 million in volume.
A Polymarket contract on whether the AI bubble bursts by December 31, 2026, currently sits at 20%, with $2.5 million traded. If model progress plateaus, that figure may start to look underpriced.
The tickers most exposed to a Citrini-style AI bubble scenario include Nvidia (NASDAQ:NVDA), ASML Holding (NASDAQ:ASML), and Broadcom (NASDAQ:AVGO), all named in Polymarket’s resolution criteria for an AI industry downturn.
KellyBench won’t move those stocks today, but as a data point on the limits of current AI capability, it nudges the probability needle away from the Citrini bull case for AI disruption and toward a slower-burn scenario.
Image: Shutterstock