François Chollet Launches Nonprofit for AGI Benchmarks

François Chollet Launches Nonprofit for AI Intelligence Benchmarks
A new nonprofit organization, the ARC Prize Foundation, is being co-founded by François Chollet, a former Google engineer and prominent AI researcher. The foundation’s primary goal is the development of benchmarks designed to assess AI systems for “human-level” intelligence.
Leadership and Funding
Greg Kamradt, previously an engineering director at Salesforce and the founder of the AI product studio Leverage, will lead the ARC Prize Foundation as its president and a board member. Fundraising efforts for the foundation are scheduled to commence later this January.
According to Chollet, the foundation represents a step towards establishing “a useful north star toward artificial general intelligence.” He emphasizes the intention to inspire advancement by highlighting the existing disparity in fundamental human capabilities.
Expanding on the ARC-AGI Test
The ARC Prize Foundation will build upon the existing ARC-AGI test, created by Chollet. This test evaluates an AI’s ability to efficiently learn new skills beyond its initial training data.
- The test utilizes puzzle-like problems.
- AI systems must generate the correct “answer” grid using various colored squares.
- The problems are specifically designed to challenge AI to adapt to unfamiliar scenarios.
ARC-AGI: A History of Evaluation
Introduced in 2019, ARC-AGI – short for “Abstract and Reasoning Corpus for Artificial General Intelligence” – has become a key metric in AI evaluation. While many AI systems excel at complex tasks like solving Math Olympiad problems, performance on ARC-AGI lagged until recently.
Previously, the highest-performing AI could solve only slightly less than one-third of the tasks within the ARC-AGI benchmark.
Focus on Bridging the Capability Gap
Chollet clarifies that the focus isn’t on measuring AI risk through exceptionally difficult questions. Instead, future iterations of the ARC-AGI benchmark will concentrate on minimizing the gap between AI and human capabilities, aiming for a score of zero.
Recent Competition and Limitations
Last June, a competition was initiated by Chollet and Mike Knoop, co-founder of Zapier, to develop an AI that could surpass ARC-AGI. OpenAI’s unreleased o3 model was the first to achieve a qualifying score, but this required substantial computational resources.
Chollet acknowledges that ARC-AGI isn’t without its shortcomings, noting that some models can achieve high scores through brute-force methods. He also maintains that o3 does not yet demonstrate human-level intelligence.
Future Benchmarks and Challenges for o3
Chollet suggests that the next ARC-AGI benchmark will likely present a significant challenge to o3, potentially reducing its score to below 30%, even with high computing power. He believes true AGI will arrive when creating tasks easy for humans but difficult for AI becomes impossible.
Plans for Second-Generation Benchmark
Knoop indicates that a second-generation ARC-AGI benchmark will be launched in the first quarter of the year, accompanied by a new competition. The nonprofit will also begin work on designing the third edition of ARC-AGI.
Addressing Criticism and Defining AGI
The ARC Prize Foundation will need to address criticisms regarding Chollet’s previous promotion of ARC-AGI as a benchmark for AGI. The definition of AGI itself is currently a subject of debate, with some, including an OpenAI staff member, claiming it has “already” been achieved if defined as AI exceeding human performance in most tasks.
Potential Partnerships
OpenAI CEO Sam Altman expressed interest in partnering with the ARC-AGI team to develop future benchmarks in December. However, Chollet did not provide any updates on potential partnerships in the recent announcement.
Building an AGI Ecosystem
The ARC Prize Foundation plans to establish an “academic network” to promote AGI progress and evaluations. Furthermore, it aims to create “a coalition of frontier AI lab partnerships” to collaborate on industry-standard AGI benchmarks.
Related Posts

ChatGPT Launches App Store for Developers

Pickle Robot Appoints Tesla Veteran as First CFO

Peripheral Labs: Self-Driving Car Sensors Enhance Sports Fan Experience

Luma AI: Generate Videos from Start and End Frames

Alexa+ Adds AI to Ring Doorbells - Amazon's New Feature
