LOGO

AI Benchmarking Group Under Fire for OpenAI Funding Disclosure Delay

January 19, 2025
AI Benchmarking Group Under Fire for OpenAI Funding Disclosure Delay

Concerns Raised Over OpenAI Funding of AI Math Benchmark

Recent revelations regarding funding for an AI benchmark development organization have sparked debate within the artificial intelligence community. Specifically, it has come to light that Epoch AI, responsible for creating math benchmarks, did not initially disclose financial support received from OpenAI.

FrontierMath and OpenAI’s o3 AI

Epoch AI, a nonprofit organization largely supported by Open Philanthropy, announced on December 20th that OpenAI had provided funding for the creation of FrontierMath. This benchmark, featuring complex mathematical problems, was designed to assess the capabilities of artificial intelligence systems. It was notably utilized by OpenAI to demonstrate the performance of its forthcoming AI model, o3.

Lack of Transparency Allegations

A contractor involved with Epoch AI, using the pseudonym “Meemi” on the LessWrong forum, stated that a significant number of those contributing to the FrontierMath benchmark were unaware of OpenAI’s financial involvement until the public announcement.

“The level of communication surrounding this matter has been lacking in transparency,” Meemi explained. “In our opinion, Epoch AI had a responsibility to reveal OpenAI’s funding, and contractors should have been fully informed about the potential use of their work for capability demonstrations before agreeing to participate in the benchmark’s development.”

Potential Impact on Benchmark Objectivity

Discussions on social media platforms have expressed worries that this lack of openness could damage FrontierMath’s standing as an impartial benchmark. Beyond funding the project, OpenAI also had prior knowledge of many of the problems and their solutions – information that wasn’t shared before December 20th, coinciding with the o3 announcement.

Exclusive Access Concerns

Carina Hong, a mathematics PhD student at Stanford University, voiced concerns on X (formerly Twitter) that OpenAI possesses exclusive access to FrontierMath due to its agreement with Epoch AI. She indicated that this situation is causing dissatisfaction among some contributors.

“I spoke with six mathematicians who made substantial contributions to the FrontierMath benchmark, and they were unaware that OpenAI would have exclusive access to it, excluding others,” Hong reported. “The majority indicated they might not have participated had they known.”

Epoch AI Acknowledges Oversight

In response to Meemi’s post, Tamay Besiroglu, associate director and co-founder of Epoch AI, affirmed that the integrity of FrontierMath remains intact. However, he conceded that the organization “made a mistake” by not being more forthcoming with information.

“We were bound by restrictions preventing disclosure of the partnership until the o3 launch, and in retrospect, we should have negotiated for the ability to inform benchmark contributors sooner,” Besiroglu stated. “Our mathematicians deserved to know who might have access to their work. Despite contractual limitations, transparency with our contributors should have been a non-negotiable condition of our agreement with OpenAI.”

Agreements Regarding Data Usage

Besiroglu clarified that while OpenAI does have access to FrontierMath, there is a “verbal agreement” in place preventing them from utilizing the benchmark’s problem set for AI training purposes. He also mentioned that Epoch AI maintains a “separate holdout set” to provide an independent means of verifying FrontierMath benchmark results.

“OpenAI has consistently supported our decision to maintain a separate, unseen holdout set,” Besiroglu added.

Independent Verification Pending

However, Elliot Glazer, lead mathematician at Epoch AI, noted on Reddit that the organization has not yet been able to independently confirm OpenAI’s FrontierMath o3 results.

“My personal assessment is that the reported score is legitimate and that OpenAI has no incentive to misrepresent internal benchmarking performance,” Glazer said. “However, we cannot provide a definitive endorsement until our independent evaluation is completed.”

Challenges in AI Benchmarking

This situation highlights the difficulties inherent in developing reliable benchmarks for evaluating AI and securing the necessary funding for such projects without creating potential conflicts of interest.

Stay informed with TechCrunch’s AI newsletter! Subscribe here to receive it in your inbox every Wednesday.

#AI benchmarking#OpenAI#funding disclosure#AI ethics#AI transparency#criticism