Google recently launched Gemini 2.5 Flash, a new addition to its Gemini family. Currently in preview, this version aims to provide developers with significantly enhanced reasoning capabilities. The "thinking" process allows developers to flexibly control costs and latency based on their needs, resulting in more cost-effective solutions.
Compared to its predecessor, 2.0 Flash, Gemini 2.5 Flash's primary upgrade lies in its reasoning capabilities. This is Google's first fully hybrid reasoning model, offering developers the option to enable or disable the thinking feature. By setting a thinking budget, developers can achieve the ideal balance between quality, cost, and latency. Even with the thinking feature disabled, 2.5 Flash maintains the rapid response speed of 2.0 Flash while further improving overall performance.
This new thinking model employs a series of reasoning processes before generating output. This process helps the model better understand input prompts, break down complex tasks, and plan more precise answers. For example, when handling complex tasks requiring multi-step reasoning (such as solving math problems or analyzing research questions), the thinking process allows the model to arrive at more accurate and comprehensive answers. In LMArena's "difficult prompt" test, Gemini 2.5 Flash performed exceptionally well, second only to 2.5 Pro.
Gemini 2.5 Flash also offers granular control over the thinking process. Developers can set a maximum number of thinking tokens to flexibly adjust reasoning quality. A higher budget allows the model to think more deeply, improving answer quality, while a budget of 0 allows the model to surpass the performance of 2.0 Flash while maintaining the lowest cost.
In practical applications, different task complexities correspond to different thinking requirements. Simple translation or calculation tasks may require only minimal thinking, while more complex mathematical problems or programming questions need more reasoning time. By setting a thinking budget, developers can choose the reasoning depth that best suits their needs, thus more effectively solving various problems.
Currently, developers can access Gemini 2.5 Flash through the Gemini API, Google AI Studio, and Vertex AI. Google encourages everyone to experiment with the thinking budget parameter and explore how controllable reasoning capabilities can solve more complex problems.