New AI Scaling Method Discovered - But Is It Real?

A Potential New AI Scaling Law Under Scrutiny
Recent discussions on social media have hinted at the possible discovery of a new AI “scaling law,” though experts are expressing reservations.
Understanding AI Scaling Laws
AI scaling laws represent an informal understanding of how AI model performance improves with increased dataset size and computational power during training. For approximately the last year, expanding “pre-training”—training increasingly large models on extensive datasets—has been the primary focus for many leading AI research facilities.
While pre-training remains vital, supplementary scaling laws—post-training scaling and test-time scaling—have emerged to enhance it. Post-training scaling involves refining a model’s outputs, while test-time scaling utilizes additional computing during inference to facilitate a form of “reasoning,” as seen in models like R1.
The Proposed “Inference-Time Search” Law
Researchers at Google and UC Berkeley have recently introduced a paper outlining what some are calling a fourth scaling law: “inference-time search.”
This technique involves a model generating multiple potential responses to a single query simultaneously, then selecting the most optimal answer from the generated set. The researchers suggest this method can elevate the performance of a model like Google’s Gemini 1.5 Pro to exceed OpenAI’s o1-preview model on science and mathematics assessments.
Claims and Observations
Eric Zhao, a Google doctorate fellow and co-author of the study, explained on X (formerly Twitter) that “by randomly sampling 200 responses and self-verifying, Gemini 1.5—an early 2024 model—beats o1-preview and approaches o1.” He further noted that “self-verification naturally becomes easier at scale,” challenging the expectation that identifying a correct solution would become more difficult with a larger pool of options.
Expert Reactions and Concerns
Despite these claims, several experts remain unconvinced, suggesting that inference-time search may not be broadly applicable.
Matthew Guzdial, an AI researcher and assistant professor at the University of Alberta, stated to TechCrunch that the approach is most effective when a clear “evaluation function” exists—meaning the correct answer is easily identifiable. However, he points out that many queries lack such straightforward solutions.
“If we can’t write code to define what we want, we can’t use [inference-time] search,” Guzdial explained. “For general language interaction, this isn’t feasible. It’s not a practical solution for most problems.”
Counterarguments and Nuances
Eric Zhao responded to Guzdial’s assessment, clarifying that the paper specifically addresses scenarios where an “evaluation function” or “ground-truth verifier” is unavailable.
“We’re studying when evaluation requires the [model] to verify itself,” Zhao stated. “Our main point is that the difference between this situation and one with ground-truth verifiers can diminish with increased scale.”
The Nature of AI “Reasoning”
Mike Cook, a research fellow at King’s College London specializing in AI, aligned with Guzdial’s viewpoint, emphasizing the distinction between AI “reasoning” and human cognitive processes.
“Inference-time search doesn’t enhance the model’s reasoning ability,” Cook said. “It’s a workaround for the limitations of a technology prone to confidently incorrect outputs. If a model errs 5% of the time, checking 200 attempts should make those errors more apparent.”
Implications for the AI Industry
The potential limitations of inference-time search may be discouraging news for the AI industry, which is actively seeking cost-effective methods to improve model “reasoning.” Current reasoning models can incur substantial computing costs—potentially thousands of dollars—for a single mathematical problem.
Consequently, the pursuit of novel scaling techniques is expected to continue.
Update
Updated 3/20 5:12 a.m. Pacific: Comments were added from study co-author Eric Zhao, who disagreed with an assessment made by an independent researcher critiquing the work.
Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature
