Google Gemini 2.0 Officially Released: 2.0 Flash Now Supports Multimodal Output

AIbase基地

Published inAI News · 7 min read · Dec 12, 2024

324

CEO Sundar Pichai of Google and its parent company Alphabet announced the launch of the latest artificial intelligence model—Gemini 2.0, marking an important step for Google in building a general AI assistant. Gemini 2.0 demonstrates significant advancements in processing multimodal inputs and utilizing native tools, enabling the AI agent to better understand the surrounding world and take actions on behalf of users under their supervision.

Gemini 2.0 is developed based on its predecessors, Gemini 1.0 and 1.5, which first introduced native multimodal processing capabilities to understand various types of information, including text, video, images, audio, and code. Currently, millions of developers are using Gemini for development, prompting Google to reimagine its products, including seven products serving 2 billion users, and to create new offerings. NotebookLM is an example of multimodal and long-context capabilities that has received widespread acclaim.

WeChat Screenshot_20241212080452.png

The launch of Gemini 2.0 signals Google's entry into a new era of agents, with the model featuring native image and audio output capabilities, as well as native tool usage. Google has begun offering Gemini 2.0 to developers and trusted testers, planning to quickly integrate it into products, starting with Gemini and search. From today, the Gemini 2.0 Flash experimental model will be available to all Gemini users. Additionally, Google has introduced a new feature called Deep Research, which uses advanced reasoning and long-context capabilities to act as a research assistant, exploring complex topics and compiling reports on behalf of users. This feature is currently available in Gemini Advanced.

As one of the products most affected by AI, Google's AI overview now reaches 1 billion people, enabling them to ask entirely new questions and quickly becoming one of Google's most popular search features. As the next step, Google plans to bring Gemini 2.0's advanced reasoning capabilities into the AI overview to address more complex topics and multi-step problems, including advanced mathematical equations, multimodal queries, and coding. Limited testing has begun this week, with a broader rollout planned for early next year. Google will also continue to expand the AI overview to more countries and languages over the next year.

Google has also showcased the cutting-edge results of its agent research through Gemini 2.0's native multimodal capabilities. Gemini 2.0 Flash is an improvement over 1.5 Flash, which has been the most popular model among developers to date, featuring similar rapid response times. Notably, 2.0 Flash has even surpassed 1.5 Pro in key benchmark tests at double the speed. 2.0 Flash also introduces new capabilities. In addition to supporting multimodal inputs like images, videos, and audio, 2.0 Flash now supports multimodal outputs, such as natively generated images mixed with text and controllable multilingual text-to-speech (TTS) audio. It can also natively invoke tools like Google Search, code execution, and third-party user-defined functions.

WeChat Screenshot_20241212080808.png

Gemini 2.0 Flash is now available as an experimental model to developers, allowing all developers to use multimodal inputs and text outputs through the Gemini API in Google AI Studio and Vertex AI, while text-to-speech and native image generation are offered to early access partners. General availability will follow in January, along with the release of more model sizes.

To help developers build dynamic and interactive applications, Google has also released a new multimodal real-time API that supports real-time audio and video stream input and can utilize multiple combination tools. More information about 2.0 Flash and the multimodal real-time API can be found on Google's developer blog.

Starting today, Gemini users worldwide can access the chat-optimized version of the 2.0 Flash experiment by selecting it from the model dropdown menu on desktop and mobile web. It will soon be available in the Gemini mobile app. Early next year, Google will expand Gemini 2.0 to more Google products.

Acceleration of Brain-Computer Interface Industrialization: China's Market Size to Reach 5.58 Billion Yuan by 2027

As the autumn recruitment season approaches, brain-computer interface technology is accelerating its industrialization and has become a new hot spot for college graduates' employment. This cutting-edge interdisciplinary field is expected to reach a market size of 5.58 billion yuan by 2027, with an annual growth rate of 20%. Currently, hundreds of universities and research institutions are involved in its development.

Microsoft's AI Chief Sulman: Microsoft Will Not Develop Sexual Content AI and Draw a Line with OpenAI

Microsoft's CEO of AI business, Sulman, clearly stated that the company will not develop sexual content AI services, emphasizing that this is not within the scope of its services. This statement was made a week after OpenAI announced allowing adults to create sexual content, highlighting Microsoft's firm stance on the ethics of generative AI.

AI Daily: Tencent Launches New IMA 2.0; Microsoft Unveils a Series of Major Updates for Copilot; Alibaba's Quark AI Glasses Go on Pre-sale

[AI Daily] The Kimi k2 model from the company Dark Side of the Moon has received praise for its performance surpassing GPT-5, and the company is about to complete another round of tens of millions of dollars in funding, just months after the last funding round. The domestic AI large model field remains highly active, and developers can learn about the latest product updates through the platform.

China University of Science and Technology and ByteDance Launch MoGA Long Video Generation Model: One-Click Generation of Minute-Level Multi-Shot Short Films

The University of Science and Technology of China and ByteDance jointly launched an end-to-end long video generation model that can directly generate high-quality videos with a duration of minutes, 480p resolution, and 24fps, supporting multi-shot switching. The core innovation is the underlying algorithm MoGA, a novel attention mechanism designed to tackle the challenges of long video generation, marking a key breakthrough in domestic video generation technology.

AI Data Center Company Crusoe Completes $1.38 Billion Equity Financing, Valuation Exceeds $10 Billion

Crusoe completed a $1.38 billion equity financing, with its valuation exceeding $10 billion, reflecting investors' high confidence in the AI infrastructure market. The company operates a large data center in Texas, providing services to giants such as OpenAI and Oracle. This round of financing was led by Valor Equity Partners and the Abu Dhabi Sovereign Wealth Fund.

EA and Stability AI Collaborate: Integrating AI into Game Development to Accelerate Content Creation

EA has formed a partnership with Stability AI, integrating AI technologies such as Stable Diffusion into game development. The two parties plan to jointly develop AI models and tools, redefining content production methods, aiming to accelerate iteration and expand creative boundaries. EA emphasizes that AI is positioned as an auxiliary tool to enhance efficiency, supporting rapid iteration and process optimization, rather than replacing human creativity.

Microsoft Launches a New AI Character: Mico, Clippy Returns as an AI Companion

Microsoft introduced the personified AI character Mico at the Copilot Fall Launch Event. The name comes from Microsoft Copilot, and it has features such as listening, changing color, and customization, positioning it as a warm virtual companion. Its inspiration seems to be derived from the classic Office assistant Clippy, and it includes hidden easter egg interactive designs.

Opera Neon Browser Launches Deep Research Agent ODRA

Recently, Opera announced that the Opera Neon browser will launch a new AI feature called Opera Deep Research Agent (referred to as ODRA). This marks a key step in Opera's efforts to build an AI ecosystem for browsers, providing users with a new and efficient solution for complex query problems. ODRA has been under development for over two years and is a core component of Opera's self-developed AI engine. After months of continuous optimization, ODRA has achieved significant improvements in performance.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Google Gemini 2.0 Officially Released: 2.0 Flash Now Supports Multimodal Output

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Acceleration of Brain-Computer Interface Industrialization: China's Market Size to Reach 5.58 Billion Yuan by 2027

Microsoft's AI Chief Sulman: Microsoft Will Not Develop Sexual Content AI and Draw a Line with OpenAI

AI Daily: Tencent Launches New IMA 2.0; Microsoft Unveils a Series of Major Updates for Copilot; Alibaba's Quark AI Glasses Go on Pre-sale

China University of Science and Technology and ByteDance Launch MoGA Long Video Generation Model: One-Click Generation of Minute-Level Multi-Shot Short Films

AI Data Center Company Crusoe Completes $1.38 Billion Equity Financing, Valuation Exceeds $10 Billion

EA and Stability AI Collaborate: Integrating AI into Game Development to Accelerate Content Creation

Meta Integrates AI Editing Features Directly into Instagram Stories for Instant Dream Effects

Microsoft Launches a New AI Character: Mico, Clippy Returns as an AI Companion

Opera Neon Browser Launches Deep Research Agent ODRA

Two 20-Year-Old Dropouts Created Turbo AI: The AI Note-taking Myth with 5 Million Users

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Google Gemini 2.0 Officially Released: 2.0 Flash Now Supports Multimodal Output

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Acceleration of Brain-Computer Interface Industrialization: China's Market Size to Reach 5.58 Billion Yuan by 2027

Microsoft's AI Chief Sulman: Microsoft Will Not Develop Sexual Content AI and Draw a Line with OpenAI

AI Daily: Tencent Launches New IMA 2.0; Microsoft Unveils a Series of Major Updates for Copilot; Alibaba's Quark AI Glasses Go on Pre-sale

China University of Science and Technology and ByteDance Launch MoGA Long Video Generation Model: One-Click Generation of Minute-Level Multi-Shot Short Films

AI Data Center Company Crusoe Completes $1.38 Billion Equity Financing, Valuation Exceeds $10 Billion

EA and Stability AI Collaborate: Integrating AI into Game Development to Accelerate Content Creation

Meta Integrates AI Editing Features Directly into Instagram Stories for Instant Dream Effects

Microsoft Launches a New AI Character: Mico, Clippy Returns as an AI Companion

Opera Neon Browser Launches Deep Research Agent ODRA

Two 20-Year-Old Dropouts Created Turbo AI: The AI Note-taking Myth with 5 Million Users

GEO Services