LOGO

DeepMind's New AI Excels in Math and Science

May 14, 2025
DeepMind's New AI Excels in Math and Science

DeepMind's AlphaEvolve: A New Approach to AI Problem Solving

DeepMind, Google’s artificial intelligence research laboratory, has announced the development of AlphaEvolve, a novel AI system designed to address challenges with solutions that can be automatically assessed.

Initial testing indicates that AlphaEvolve possesses the potential to refine aspects of the infrastructure Google utilizes for training its AI models. DeepMind is currently developing a user interface for interaction with the system. An early access program is planned for select academic researchers, potentially preceding a wider release.

Addressing the Issue of AI Hallucinations

A common issue with many AI models is their tendency to “hallucinate” – confidently generating incorrect or fabricated information. This is inherent in their probabilistic nature. Interestingly, more recent models, such as OpenAI’s o3, exhibit a higher rate of hallucination compared to their predecessors, highlighting the complexity of this problem.

AlphaEvolve tackles this issue with an innovative approach: an automated evaluation system. This system employs models to generate potential answers, critically analyze them, and establish a ranked pool of solutions based on accuracy.

deepmind claims its newest ai tool is a whiz at math and science problemsBuilding on Previous Research

While the concept of automated evaluation isn’t entirely new – researchers, including a DeepMind team in the past, have explored similar techniques in mathematical fields – DeepMind asserts that AlphaEvolve’s utilization of advanced models, specifically Gemini models, significantly enhances its capabilities compared to previous iterations.

Users interact with AlphaEvolve by presenting a problem, optionally including supporting information such as instructions, equations, code, and relevant literature. Crucially, they must also provide a method for automatic assessment of the system’s responses, typically in the form of a defined formula.

Limitations and Applicable Problem Types

Due to its reliance on self-evaluation, AlphaEvolve is limited to problems where such evaluation is possible. This restricts its application to areas like computer science and system optimization. Furthermore, the system can only express solutions as algorithms, making it unsuitable for problems requiring non-numerical answers.

Performance Benchmarks

To assess AlphaEvolve’s performance, DeepMind tasked it with solving a collection of approximately 50 mathematical problems covering diverse areas like geometry and combinatorics. The system successfully “rediscovered” optimal solutions in 75% of cases and identified improvements in 20% of instances, according to DeepMind’s claims.

The system was also tested on real-world applications, including improving the efficiency of Google’s data centers and accelerating AI model training. DeepMind reports that AlphaEvolve generated an algorithm that consistently recovers 0.7% of Google’s global computing resources. Additionally, it proposed an optimization that reduced the time required to train Gemini models by 1%.

Incremental Improvements, Not Breakthroughs

It’s important to note that AlphaEvolve isn’t delivering revolutionary discoveries. For example, it identified an improvement to Google’s TPU AI accelerator chip design that had already been noted by other tools.

However, DeepMind emphasizes, as do many AI labs, that AlphaEvolve can streamline processes and allow experts to concentrate on more complex and critical tasks.

#DeepMind#AI#artificial intelligence#math#science#problem solving