LOGO

AI Benchmark: New Record Set - Can AI Compete?

January 23, 2025
AI Benchmark: New Record Set - Can AI Compete?

New Benchmark Challenges Frontier AI Systems

The Center for AI Safety (CAIS), a nonprofit organization, alongside Scale AI – a provider of data labeling and AI development solutions – has unveiled a rigorous new assessment designed for cutting-edge AI systems.

Introducing Humanity’s Last Exam

This benchmark, titled Humanity’s Last Exam, comprises a vast collection of questions sourced from crowdsourcing efforts. The subject matter spans a wide range of disciplines, including mathematics, the humanities, and the natural sciences.

To increase the difficulty of the evaluation, the questions are presented in diverse formats. These include question types that integrate visual elements like diagrams and images.

Preliminary Results Indicate Significant Challenges

Initial testing revealed a notable performance gap. No currently accessible, leading AI system achieved a score exceeding 10% on Humanity’s Last Exam.

This suggests that current models still face substantial hurdles in demonstrating comprehensive understanding and reasoning capabilities.

Open Access for Research and Development

CAIS and Scale AI intend to make the benchmark openly available to the broader research community.

This accessibility will enable researchers to conduct more in-depth analyses of the benchmark’s nuances and to assess the performance of emerging AI models.

The goal is to foster further investigation into the variations within the exam and drive advancements in AI capabilities.

#AI benchmark#artificial intelligence#AI performance#machine learning#AI competition#new AI standard