Recently, Max Woolf, a senior data scientist at BuzzFeed, conducted an experiment to explore the effects of repeatedly asking AI to improve code. In the experiment, he used the Claude 3.5 language model and presented a classic programming challenge: to write Python code that finds the difference between the maximum and minimum values of the sum of numbers equal to 30 among one million random numbers.
Image Source Note: Image generated by AI, image licensed by Midjourney
In the initial version, the code generated by Claude had a runtime of 657 milliseconds. However, as Woolf continuously input the simple instruction "write better code," the final generated code's runtime was reduced to just 6 milliseconds, achieving a performance improvement of 100 times. This result is not only remarkable, but the AI also displayed unexpected changes in its definition of "better code."
During the fourth request for "better code," Claude unexpectedly transformed the code into a structure resembling that of an enterprise application, adding some typical enterprise features, even though Woolf did not request this. This indicates that the AI may associate "better code" with "enterprise-level software," reflecting the knowledge it absorbed during its training.
Developer Simon Willison analyzed this iterative improvement phenomenon and noted that the language model approaches the code with a fresh perspective with each new request. Even though each request contains the context of previous conversations, Claude analyzes it as if seeing the code for the first time, allowing it to continually improve.
However, Woolf found that while attempting more specific requests, this approach could yield better results more quickly, some subtle errors still appeared in the code that required human correction. Therefore, he emphasized that precise prompt engineering remains crucial. Although simple follow-up questions can initially improve code quality, targeted prompt engineering can lead to significant performance enhancements, albeit with increased risks.
It is worth noting that in this experiment, Claude skipped some optimization steps that human developers might consider standard, such as deduplication or sorting the numbers first. Additionally, subtle changes in the way questions are asked can significantly affect Claude's output.
Despite these impressive performance improvements, Woolf still reminds us that human developers remain indispensable for validating solutions and troubleshooting. He points out that while AI-generated code cannot be used directly, its capabilities in creativity and tool suggestions are worth paying attention to.
Key Points:
🌟 AI enhances code performance through repeated instructions, reducing the original code's runtime from 657 milliseconds to 6 milliseconds.
💡 AI automatically adds enterprise features to the code, showcasing its unique understanding of "better code."
🛠️ Prompt engineering remains important; precise requests can accelerate result generation, but human developers are still needed for validation and correction.