Noam Brown, OpenAI's research lead for AI reasoning, recently stated at NVIDIA's GTC conference that certain forms of "reasoning" AI models could have emerged 20 years earlier if researchers had "known the right methods and algorithms." He pointed out several reasons why this research direction was overlooked.

OpenAI

Brown recalled his experience at Carnegie Mellon University researching game-playing AI, including Pluribus, which defeated top human poker professionals. He explained that the AI he helped create was unique in its ability to "reason" to solve problems, rather than relying solely on brute-force computation. Brown suggested that humans' deliberate thinking in challenging situations might be highly beneficial for AI.

Brown is also one of the architects of OpenAI's o1 model. This model employs a technique called "test-time reasoning," causing it to "think" before responding to queries. Test-time reasoning drives a form of "reasoning" by applying additional computation to the model during runtime. Generally, so-called reasoning models are more accurate and reliable than traditional models, especially in fields like mathematics and science.

When asked about the feasibility of academic institutions conducting experiments on the scale of OpenAI, given their generally limited computational resources, Brown acknowledged the increasing difficulty due to the escalating computational demands of recent models. However, he noted that academia can play a vital role by exploring areas with lower computational requirements, such as model architecture design.

Brown highlighted opportunities for collaboration between leading labs and academia. He stated that leading labs examine academic publications, carefully evaluating whether the presented arguments are compelling enough—that is, whether the research would be highly effective if scaled up. If a paper presents a convincing argument, these labs will investigate further.

Furthermore, Brown specifically mentioned the field of AI benchmarking, where academia can exert significant influence. He criticized the current state of AI benchmarks as "terrible," noting that they often assess esoteric knowledge, with scores poorly correlating with proficiency in tasks relevant to most people, leading to widespread misunderstandings of model capabilities and improvements. Brown believes that improving AI benchmarks doesn't require vast computational resources.

It's important to note that Brown's initial comments in this discussion referred to his work on game-playing AI, such as Pluribus, before joining OpenAI, not reasoning models like o1.

Key Takeaways:

  • 🤔 Noam Brown of OpenAI believes that "reasoning" AI could have been developed 20 years earlier if the right methods had been discovered sooner, suggesting a past oversight in research direction.
  • 🤝 Brown emphasizes opportunities for collaboration between academia and leading AI labs, with academia playing a crucial role in areas with lower computational demands, such as model architecture design and AI benchmarking.
  • 📈 Reasoning models using techniques like "test-time reasoning" are more accurate and reliable than traditional models, particularly in mathematics and science.