Recently, OpenAI quietly lifted the veil of mystery surrounding the "Strawberry" project. This project, previously known as Q*, has now reemerged under the name Strawberry. It is said to enable AI to plan tasks in advance, autonomously search the internet for information, and even conduct in-depth research.

Even the tech giant Musk couldn't resist chiming in, humorously commenting in a post: "I thought the end of AI was the pinball disaster, but now it seems to be an endless field of strawberries."

image.png

Despite the curiosity from the outside world, OpenAI remains tight-lipped about the operational details of the Strawberry project. The development process is highly confidential within the company, to the point where even the release date remains a mystery.

During a recent internal meeting, OpenAI demonstrated a version of the Strawberry project, which boasts an impressive reasoning ability that is nearly on par with humans. This aligns with their recently announced AGI roadmap, prompting speculation that OpenAI may be brewing something bigger.

The design philosophy of the Strawberry model is to enable AI not only to generate answers to queries but also to plan in advance, autonomously and reliably browse the internet, and conduct so-called "deep research." Such capabilities are a first in the AI field.

Insiders say that OpenAI's Strawberry project is somewhat similar to a method developed by Stanford University, known as the "Self-Taught Reasoner" (abbreviated as STaR). STaR achieves self-improvement by iteratively creating training data.

image.png

For the address of the paper: https://arxiv.org/pdf/2203.14465

Currently, the methods for generating reasoning processes in AI are either costly or sacrifice accuracy. But STaR technology allows AI to self-improve by iteratively using a small number of reasoning examples and a large amount of non-reasoning data.

The workflow of STaR technology is as follows: first, AI tries to answer many questions and generate reasoning processes. If the answers are incorrect, it regenerates the reasoning knowing the correct answer. Then, fine-tune all the reasoning that ultimately leads to the correct answer, and repeat this process.

OpenAI hopes that the innovation of the Strawberry project will significantly improve the reasoning ability of AI models. This involves a special treatment method - adjusting the AI model after it has been pre-trained with a large amount of data to optimize performance.

OpenAI also hopes that Strawberry can perform long-haul tasks (LHT), which require the model to plan and execute a series of actions in advance. To achieve this goal, they are creating and evaluating "deep research" datasets.

As the Strawberry project progresses, OpenAI is getting closer and closer to the goal of achieving AGI. If the reasoning ability of Strawberry can truly match humans, the future of AI will be limitless.