CEO Sundar Pichai of Google and its parent company Alphabet announced the launch of the latest artificial intelligence model - Gemini 2.0, marking an important step forward for Google in the development of a general AI assistant. Gemini 2.0 demonstrates significant advancements in processing multimodal inputs and utilizing native tools, enabling AI agents to understand the surrounding world more deeply and take actions on behalf of users under their supervision.

Gemini 2.0 is developed based on its predecessors, Gemini 1.0 and 1.5, the latter of which first achieved native multimodal processing capabilities, understanding various types of information including text, video, images, audio, and code. Currently, millions of developers are using Gemini for their projects, prompting Google to rethink its products, including seven products serving 2 billion users, and to create new offerings. NotebookLM is an example of multimodal and long-context capabilities, which has gained widespread popularity.

WeChat Screenshot_20241212080452.png

The launch of Gemini 2.0 signifies a new era of agents for Google, as this model has native capabilities for image and audio output, as well as the use of native tools. Google has begun providing Gemini 2.0 to developers and trusted testers, with plans to quickly integrate it into products, starting with Gemini and Search. Effective immediately, the Gemini 2.0 Flash experimental model will be accessible to all Gemini users. Additionally, Google has introduced a new feature called Deep Research, which employs advanced reasoning and long-context capabilities to act as a research assistant, exploring complex topics and compiling reports on behalf of users. This feature is currently available in Gemini Advanced.

As one of the products most affected by AI, Google's AI overview now reaches 1 billion people, enabling them to ask entirely new questions, quickly becoming one of Google's most popular search features. As the next step, Google will incorporate the advanced reasoning capabilities of Gemini 2.0 into the AI overview to address more complex topics and multi-step problems, including advanced mathematical equations, multimodal queries, and coding. Limited testing began this week, with plans for a broader rollout early next year. Google will also continue to expand the AI overview to more countries and languages over the next year.

Google has also showcased cutting-edge results from its agent research through Gemini 2.0's native multimodal capabilities. Gemini 2.0 Flash is an improvement over 1.5 Flash, which has been the most popular model among developers so far, featuring similar fast response times. Notably, 2.0 Flash has even surpassed 1.5 Pro in key benchmark tests at double the speed. 2.0 Flash also introduces new capabilities. In addition to supporting multimodal inputs such as images, videos, and audio, 2.0 Flash now supports multimodal outputs, including natively generated images mixed with text and controllable multilingual text-to-speech (TTS) audio. It can also natively call tools such as Google Search, code execution, and third-party user-defined functions.

WeChat Screenshot_20241212080808.png

Gemini 2.0 Flash is now available to developers as an experimental model, allowing all developers to use multimodal inputs and text outputs through the Google AI Studio and Vertex AI's Gemini API, while text-to-speech and native image generation are provided to early access partners. General availability will follow in January, along with more model sizes being introduced.

To assist developers in building dynamic and interactive applications, Google has also released a new multimodal real-time API, capable of real-time audio and video streaming input and utilizing multiple combinatorial tools.

Starting today, Gemini users worldwide can access the chat-optimized version of 2.0 Flash by selecting it from the model dropdown menu on desktop and mobile web. It will soon be available in the Gemini mobile app. Early next year, Google plans to expand Gemini 2.0 to more Google products.