are bad incentives to blame for ai hallucinations?

OpenAI Research on Hallucinations in Large Language Models
A recent research paper originating from OpenAI investigates the persistent issue of hallucinations within large language models (LLMs), such as the anticipated GPT-5, and conversational AI like ChatGPT. The study explores the underlying causes of these inaccuracies and potential strategies for mitigation.
Defining Hallucinations
OpenAI characterizes hallucinations as the generation of statements that, while appearing plausible, are demonstrably false. Despite ongoing advancements in LLM technology, the organization acknowledges that these instances of fabricated information “remain a fundamental challenge” and are unlikely to be entirely eradicated.
The inherent difficulty in eliminating hallucinations was illustrated through experimentation. When presented with a query regarding the doctoral dissertation title of Adam Tauman Kalai – also a co-author of the paper – a popular chatbot provided three distinct, yet incorrect, responses.
The Root of the Problem: Predictive Training
Further investigation revealed inconsistencies even when querying about Kalai’s birthdate, with the chatbot offering three different, inaccurate dates. Researchers posit that these errors stem from the pretraining methodology employed in LLMs.
This process prioritizes accurate next-word prediction, devoid of explicit “true” or “false” labeling during training. The model learns solely from examples of fluent language, necessitating an approximation of the broader data distribution.
Certain linguistic elements, like spelling and punctuation, exhibit predictable patterns that are refined with increased model scale. However, factual details with low occurrence rates – such as a person’s birthday – cannot be reliably inferred from patterns alone, leading to the generation of false information.
Re-evaluating Evaluation Metrics
The paper’s proposed solution doesn’t center on altering the initial pretraining phase. Instead, it emphasizes a shift in how large language models are assessed and evaluated.
Current evaluation methods, the researchers argue, don’t directly cause hallucinations, but they inadvertently “set the wrong incentives” for model behavior.
The Incentive to Guess
The researchers draw a parallel to multiple-choice exams where random guessing can yield a correct answer, while abstaining from answering guarantees a zero score. This creates a situation where attempting a guess, even without certainty, is the more advantageous strategy.
Similarly, LLMs, when evaluated solely on accuracy, are encouraged to provide an answer – even if speculative – rather than admitting uncertainty. This is because a correct guess contributes to a higher score, while acknowledging a lack of knowledge results in no credit.
A New Approach to Scoring
To address this, OpenAI proposes an evaluation system that mirrors tests like the SAT, incorporating negative scoring for incorrect answers or partial credit for expressing uncertainty. This would incentivize models to prioritize accuracy and acknowledge limitations.
The researchers stress that simply adding a few new, uncertainty-aware tests is insufficient. A comprehensive overhaul of widely used, accuracy-based evaluations is required to discourage the practice of “blind guessing.”
Ultimately, the researchers conclude that as long as existing scoring systems continue to reward fortuitous guesses, models will persist in learning to speculate rather than accurately represent knowledge.
Related Posts

openai says it’s turned off app suggestions that look like ads

pat gelsinger wants to save moore’s law, with a little help from the feds

ex-googler’s yoodli triples valuation to $300m+ with ai built to assist, not replace, people

sources: ai synthetic research startup aaru raised a series a at a $1b ‘headline’ valuation

meta acquires ai device startup limitless
