In the increasingly competitive field of artificial intelligence, Google recently announced the launch of the Gemini 2.0 Flash Thinking model. This multimodal reasoning model features fast and transparent processing capabilities, allowing it to tackle complex problems. Google's CEO Sundar Pichai stated on social media platform X: "This is our deepest model to date."

image.png

According to the developer documentation, the Flash Thinking of Gemini 2 offers stronger reasoning capabilities than the basic version of the Gemini 2.0 Flash model. The new model supports 32,000 input tokens (approximately 50 to 60 pages of text), with output responses capable of reaching 8,000 tokens. Google indicated in its AI Studio sidebar that this model is particularly suitable for "multimodal understanding, reasoning," and "coding."

Developer documentation: https://ai.google.dev/gemini-api/docs/thinking-mode?hl=en

Currently, detailed information regarding the model's training process, architecture, licensing, and costs has not been disclosed, but Google AI Studio shows that the cost per token for using this model is currently zero.

A notable feature of Gemini 2.0 is that it allows users to access the step-by-step reasoning process of the model through a dropdown menu, which is not available in competing models such as OpenAI's o1 and o1mini. This transparent reasoning approach enables users to clearly understand how the model arrives at its conclusions, effectively addressing the issue of AI being perceived as a "black box."

image.png

In some simple tests, Gemini 2.0 was able to quickly (within one to three seconds) answer complex questions correctly, such as counting the number of times the letter "R" appears in the word "strawberry." In another test, the model systematically compared two decimals (9.9 and 9.11) by analyzing the overall numbers and decimal places step-by-step.

The independent analysis firm LM Arena rated the Gemini 2.0 Flash Thinking model as the best-performing model among all large language model categories.

Additionally, the Gemini 2.0 Flash Thinking model features native image upload and analysis capabilities. In contrast, OpenAI's o1 was initially a text model and later expanded to include image and file analysis. Currently, both can only return text output.

Although the multimodal capabilities of the Gemini 2.0 Flash Thinking model expand its potential application scenarios, developers should note that the model currently does not support integration with Google Search and cannot be combined with other Google applications or external tools. Developers can experiment with this model through Google AI Studio and Vertex AI.

In the increasingly competitive AI market, the Gemini 2.0 Flash Thinking model may mark a new era for problem-solving models. With its ability to handle various data types, provide visual reasoning, and operate on a large scale, it has become a significant competitor to OpenAI's o1 series and other models in the reasoning AI market.

Key Points:

🌟 The Gemini 2.0 Flash Thinking model has powerful reasoning capabilities, supporting 32,000 input tokens and 8,000 output tokens.

💡 The model enhances transparency by providing step-by-step reasoning through a dropdown menu, addressing the AI "black box" issue.

🖼️ It features native image upload and analysis capabilities, expanding multimodal application scenarios.