CEO Sundar Pichai of Google and its parent company Alphabet announced the launch of the latest artificial intelligence model—Gemini 2.0, marking an important step for Google in building a general AI assistant. Gemini 2.0 demonstrates significant advancements in processing multimodal inputs and utilizing native tools, enabling the AI agent to better understand the surrounding world and take actions on behalf of users under their supervision.
Gemini 2.0 is developed based on its predecessors, Gemini 1.0 and 1.5, which first introduced native multimodal processing capabilities to understand various types of information, including text, video, images, audio, and code. Currently, millions of developers are using Gemini for development, prompting Google to reimagine its products, including seven products serving 2 billion users, and to create new offerings. NotebookLM is an example of multimodal and long-context capabilities that has received widespread acclaim.
The launch of Gemini 2.0 signals Google's entry into a new era of agents, with the model featuring native image and audio output capabilities, as well as native tool usage. Google has begun offering Gemini 2.0 to developers and trusted testers, planning to quickly integrate it into products, starting with Gemini and search. From today, the Gemini 2.0 Flash experimental model will be available to all Gemini users. Additionally, Google has introduced a new feature called Deep Research, which uses advanced reasoning and long-context capabilities to act as a research assistant, exploring complex topics and compiling reports on behalf of users. This feature is currently available in Gemini Advanced.
As one of the products most affected by AI, Google's AI overview now reaches 1 billion people, enabling them to ask entirely new questions and quickly becoming one of Google's most popular search features. As the next step, Google plans to bring Gemini 2.0's advanced reasoning capabilities into the AI overview to address more complex topics and multi-step problems, including advanced mathematical equations, multimodal queries, and coding. Limited testing has begun this week, with a broader rollout planned for early next year. Google will also continue to expand the AI overview to more countries and languages over the next year.
Google has also showcased the cutting-edge results of its agent research through Gemini 2.0's native multimodal capabilities. Gemini 2.0 Flash is an improvement over 1.5 Flash, which has been the most popular model among developers to date, featuring similar rapid response times. Notably, 2.0 Flash has even surpassed 1.5 Pro in key benchmark tests at double the speed. 2.0 Flash also introduces new capabilities. In addition to supporting multimodal inputs like images, videos, and audio, 2.0 Flash now supports multimodal outputs, such as natively generated images mixed with text and controllable multilingual text-to-speech (TTS) audio. It can also natively invoke tools like Google Search, code execution, and third-party user-defined functions.
Gemini 2.0 Flash is now available as an experimental model to developers, allowing all developers to use multimodal inputs and text outputs through the Gemini API in Google AI Studio and Vertex AI, while text-to-speech and native image generation are offered to early access partners. General availability will follow in January, along with the release of more model sizes.
To help developers build dynamic and interactive applications, Google has also released a new multimodal real-time API that supports real-time audio and video stream input and can utilize multiple combination tools. More information about 2.0 Flash and the multimodal real-time API can be found on Google's developer blog.
Starting today, Gemini users worldwide can access the chat-optimized version of the 2.0 Flash experiment by selecting it from the model dropdown menu on desktop and mobile web. It will soon be available in the Gemini mobile app. Early next year, Google will expand Gemini 2.0 to more Google products.