OpenAI has recently introduced a significant update, adding the "Predicted Outputs" feature to the GPT-4o model. This innovative technology significantly enhances the model's response speed, achieving up to five times the original speed in specific scenarios, offering developers a new level of efficiency.
This feature, developed in collaboration with FactoryAI, excels by bypassing the repetitive generation of known content. It performs exceptionally well in practical applications, especially in tasks like updating blog posts, iterating existing responses, or rewriting code. According to data provided by FactoryAI, response times in programming tasks have been reduced by 2 to 4 times, compressing tasks that originally took 70 seconds into just 20 seconds.
Currently, this feature is only available to developers via API, supporting both the GPT-4o and GPT-4mini models. The feedback from actual usage has been positive, with several developers already testing and sharing their experiences. Eric Ciarla, founder of Firecrawl, noted during SEO content conversion: "The speed improvement is significant, and the usage is straightforward."
Technically, the Predicted Outputs feature works by identifying and reusing predictable content parts. OpenAI's official documentation provides an example, such as in code refactoring scenarios, where changing the "Username" attribute to "Email" in C# code can greatly enhance generation speed by inputting the entire class file as predicted text.
However, there are some limitations and precautions to consider with this feature. In addition to model support restrictions, certain API parameters are unavailable when using Predicted Outputs, including n values greater than 1, logprobs, and presence_penalty and frequency_penalty greater than 0.
It is also worth noting that while this feature provides faster response times, it incurs a slight increase in cost. User test data shows that while processing time for the same task decreased from 5.2 seconds to 3.3 seconds with the Predicted Outputs feature, the cost rose from 0.1555 cents to 0.2675 cents. This is because OpenAI charges for tokens provided during prediction at the same rate as completed tokens.
Despite the slight increase in cost, considering the significant efficiency gains, this feature still holds considerable value for application. Developers can access more detailed technical explanations and usage guides through OpenAI's official documentation.
OpenAI Official Documentation:
https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs