Recently, the nonprofit organization Epoch AI, which develops AI math benchmarks, has come under fire for not disclosing its funding from OpenAI in a timely manner. On December 20, the organization announced that OpenAI had funded a project called FrontierMath, which aims to test AI's mathematical abilities. OpenAI also used this benchmark to showcase its upcoming flagship AI product, o3.

A contractor from Epoch AI, using the nickname "Meemi" on the forum LessWrong, stated that many contributors to the FrontierMath project were unaware of OpenAI's funding before it was publicly disclosed. He mentioned, "There was a lack of transparency in communication regarding this matter. In my opinion, Epoch AI should have disclosed OpenAI's funding sources in advance, and contractors should be aware that their work might be used for capability assessment before deciding to participate in the benchmark's development."

On social media, some users expressed concern, arguing that this secrecy could damage FrontierMath's reputation as an objective benchmark. In addition to funding FrontierMath, OpenAI also had visibility into many of the problems and solutions within the benchmark, which Epoch AI did not disclose before December 20.

Carina Hong, a PhD student in mathematics at Stanford University, pointed out on social media that OpenAI gained priority access to FrontierMath due to its collaboration with Epoch AI, which left some contributors dissatisfied. "Six mathematicians who made significant contributions to the FrontierMath benchmark confirmed that they were unaware OpenAI would have exclusive access to the benchmark, preventing others from accessing it." Hong stated that most contributors indicated they might not have participated in the project if they had known about this arrangement in advance.

Tamay Besiroglu, Deputy Director of Epoch AI, acknowledged that while the organization's transparency was lacking, he believes the integrity of FrontierMath has not been compromised. He admitted that Epoch AI made a communication error by failing to inform contributors in advance about OpenAI's involvement.

Besiroglu stated that while OpenAI has access to FrontierMath, there is a "verbal agreement" that OpenAI will not use the benchmark's problem set to train its AI. Epoch AI also retains a "separate holdout set" to ensure the independent verification of FrontierMath benchmark results.

Ellot Glazer, Chief Mathematician at Epoch AI, mentioned on Reddit that Epoch AI has not independently verified OpenAI's FrontierMath o3 results. He believes OpenAI's scores are credible, but cannot be confirmed until an independent evaluation is completed.

Key Points:

💡 Epoch AI has faced criticism for not disclosing OpenAI's funding in a timely manner, leading to dissatisfaction among some contributors.  

🔍 The integrity of the FrontierMath benchmark is questioned, as OpenAI gained priority access to the project.  

🔒 Epoch AI acknowledges communication errors but maintains a transparent evaluation mechanism in its collaboration with OpenAI.