Why IQ Tests Are Flawed for Evaluating AI

AI's Rapid Advancement and the Question of "Intelligence"
OpenAI CEO Sam Altman recently stated his observation of a swift increase in the “IQ” of artificial intelligence systems over the last few years.
Altman clarified that this assessment, while indicative of a trend, isn’t based on strict scientific measurement. He described it as a subjective impression – a “vibe or spiritual answer” – suggesting roughly a one standard deviation increase in IQ annually.
The Use of IQ as a Benchmark
Altman is not alone in utilizing the concept of IQ, traditionally a measure of human intelligence, as a comparative metric for AI development. Social media platforms have seen AI enthusiasts administer IQ tests to models and subsequently publish the results.
However, a significant number of experts contend that IQ is an inadequate and potentially misleading indicator of an AI model’s true capabilities.
Sandra Wachter, a researcher focused on technology and regulation at Oxford University, explained to TechCrunch that applying human-centric measurements to AI is fundamentally flawed. She likened it to an unproductive comparison between dissimilar entities.
Limitations of IQ Testing
While Altman equated IQ with overall intelligence, it’s crucial to recognize that IQ tests provide relative, not absolute, assessments of specific cognitive skills.
There is general agreement that IQ tests effectively evaluate logic and abstract reasoning. However, they fail to measure practical intelligence – the ability to apply knowledge to real-world situations – and offer only a limited snapshot of cognitive ability.
Wachter emphasized that IQ tests are designed to measure human capabilities, and are inherently based on assumptions about what constitutes human intelligence. Applying these same measures to AI is inappropriate. Just as a car excels in speed and a submarine in diving, these capabilities don’t equate to surpassing human intelligence, which is far more multifaceted.
Historical Context and Potential Biases
The origins of IQ testing are linked to the discredited scientific theory of eugenics, which advocated for improving the human race through selective breeding. A successful performance on an IQ test requires a robust working memory and familiarity with Western cultural norms.
This inherent reliance on specific knowledge and cultural context introduces the potential for bias, leading one psychologist to characterize IQ tests as “ideologically corruptible mechanical models” of intelligence.
Gaming the System
Os Keyes, a doctoral candidate at the University of Washington specializing in ethical AI, argues that a model’s high score on an IQ test reveals more about the test’s shortcomings than the model’s actual intelligence.
Keyes pointed out that these tests are relatively easy to manipulate given virtually unlimited memory and processing time. IQ tests represent a limited method for gauging cognition, sentience, and intelligence – a point recognized even before the advent of digital computers.
AI's Unfair Advantages
AI systems likely possess an inherent advantage on IQ tests due to their vast memory capacity and access to internalized knowledge. Furthermore, many models are trained on publicly available web data, which frequently includes examples of IQ test questions.
Mike Cook, a research fellow at King’s College London specializing in AI, noted that consistently practicing IQ tests – essentially what AI models do – is a reliable way to improve scores. Unlike humans, AI can process information with perfect clarity and without signal loss.
The Inappropriateness of Human-Centric Evaluation
Cook added that IQ tests were originally designed for humans, as a means of evaluating general problem-solving skills. They are therefore unsuitable for evaluating a technology that tackles problems in fundamentally different ways.
“A crow might be able to use a tool to recover a treat from a box, but that doesn’t mean it can enroll at Harvard,” Cook stated. Human brains grapple with numerous distractions and limitations when solving problems, unlike AI, which receives substantial assistance.
The Need for New AI Evaluation Methods
Heidy Khlaaf, chief AI scientist at the AI Now Institute, highlighted the necessity for developing more appropriate AI evaluation methods.
Khlaaf explained that throughout the history of computing, comparisons between computing abilities and human abilities have been uncommon, as systems have consistently surpassed human capabilities in specific tasks. The recent trend of directly comparing AI performance to human abilities is a contested phenomenon, particularly concerning the constantly evolving benchmarks used to assess AI systems.
- The use of IQ as a metric for AI is debated among experts.
- IQ tests are designed for humans and may not accurately reflect AI capabilities.
- New methods for evaluating AI are needed to better understand its progress.
Related Posts

Disney Cease and Desist: Google Faces Copyright Infringement Claim

OpenAI Responds to Google with GPT-5.2 After 'Code Red' Memo

Waymo Baby Delivery: Birth in Self-Driving Car

Google AI Leadership: Promoting Data Center Tech Expert
