At the University of Pennsylvania, Professor Robert Ghrist, a mathematician, is engaged in an intriguing "intellectual showdown" with an AI model named GPT-o1-mini. While striving to derive a more complex generalization of the bottleneck duality theorem, Professor Ghrist experienced countless cycles of optimism and frustration.
Ghrist had previously attempted to use several renowned AI models, including GPT-4, Claude-3.5, and Gemini-1.5-Pro. Although these models could make some hypotheses and provide evidence, they often "crashed" due to subtle errors, which was quite disheartening for Ghrist. Ultimately, he partnered with OpenAI's GPT-o1-mini model and achieved a breakthrough. This model not only analyzed a flawed proof, identified the error, but also generated a "new and ingenious correct proof" in just 43 seconds, which was even more elegant than the human version.
Image source note: The image was generated by AI, and the image authorization service provider is Midjourney
GPT-o1-mini excels in logical tasks, employing chained thinking technology. Although it surpasses traditional language models in logical and planning benchmarks, there is still a possibility of errors. Ghrist's summary of this experience is: "The result is right on the border of whether large language models (LLM) can prove." He explained that identifying the failure modes of the model was key to this experiment.
Despite the success, Ghrist also admitted that using AI is not necessarily faster than doing it all by himself. He even stated that relying on these models, the final paper turned out even better. His paper also included an appendix detailing the role of the AI model in the achievement.
However, things do not always go smoothly. Shortly after the paper was published, another mathematician, Sridhar Ramesh, pointed out on social media that the proof could actually be easily accomplished using a theorem by Birkhoff, which came as a surprise to Ghrist. He humorously acknowledged: "Humans win..." This collaboration with AI, while yielding results, also made him realize that sometimes human wisdom is the most effective solution.