Google Plans to Combine Gemini and Veo AI Models to Advance Smart Assistants

AIbase基地

Published inAI News · 4 min read · Apr 11, 2025

In a recent podcast, Demis Hassabis, CEO of Google DeepMind, revealed Google's plan to eventually integrate its Gemini AI model with its video generation model, Veo, to enhance Gemini's understanding of the physical world. He emphasized that Gemini was designed from the outset to be multimodal, aiming for a "universal digital assistant" that genuinely helps users in the real world.

Google's Gemini large language model

Hassabis mentioned the AI industry's shift towards "omnipotent" models capable of understanding and synthesizing various media forms. For instance, Google's latest Gemini model generates text, images, and audio. OpenAI's default model in ChatGPT also natively creates images. Furthermore, Amazon announced plans to launch an "anything-to-anything" model this year.

Developing these omnipotent models requires vast training data, including images, videos, audio, and text. Hassabis hinted that Veo's training data primarily comes from Google's YouTube platform. He stated that by watching countless YouTube videos, Veo learns the physical laws of the world.

Google previously stated its models "may" be trained on "some" YouTube content, according to agreements with YouTube creators. Reports indicate Google expanded its terms of service last year to access more data for AI model training. This strategy reflects Google's proactive approach to enhancing its AI capabilities to meet market demands.

With the rapid advancement of AI technology, Google's plan highlights the industry's focus on multimodal AI and its potential future direction. Combining Gemini and Veo will offer users richer interactive experiences, enabling AI to better integrate into daily life.

Key Takeaways:
- 🤖 Google plans to integrate Gemini and Veo AI models to improve understanding of the physical world.
- 🎥 Veo's training data primarily comes from YouTube, learning physical laws from countless videos.
- 🌐 The AI industry is moving towards multimodal "omnipotent" models to meet growing market demands.

Gemini Veo Multimodal Universal Digital Assistant

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Google Gemini Unveils New Circle Screen Feature for Enhanced Search

Google is reportedly developing a new feature called "Circle Screen" to improve the search experience on its Gemini AI platform. According to Android Authority, a video showcasing Gemini's screen sharing capabilities and hinting at this unreleased option was inadvertently posted on Instagram. The highlight of the "Circle Screen" feature is its ability to...

Apr 12, 2025

190

AI Daily: OpenAI to Potentially Release GPT-4.1 Series Next Week; Pika's New AI Video Feature 'Twists'; SenseTime's 'SenseNova' V6 Makes a Stunning Debut

Welcome to the AI Daily column! Your daily guide to exploring the world of artificial intelligence. We present you with the hottest content in the AI field, focusing on developers and helping you understand technology trends and innovative AI product applications. Discover new AI products here: https://top.aibase.com/ 1. Reports suggest OpenAI will release the GPT-4.1 series next week, including Mini and Nano versions. OpenAI's upcoming release of the GPT-4.1 and o3 series marks a significant advancement in...

Apr 11, 2025

380

Report: OpenAI to Release GPT-4.1 Series Next Week, Including Mini and Nano Versions

AI leader OpenAI is poised to unleash a new wave of technological advancements next week! According to tech media outlet The Verge, OpenAI plans to launch a major update including the GPT-4.1 series, o3 series, and several other AI models. This flurry of releases not only demonstrates OpenAI's ambition for accelerated innovation but also provides the industry with more powerful AI tools. GPT-4.1 Series: A Comprehensive Upgrade in Multimodal Capabilities As the successor to GPT-4.0, the GPT-4.1 series...

Apr 11, 2025

1.6k

SenseTime Unveils New Multimodal Large Model, Shaping the Future of Interaction

At SenseTime's Technology Exchange Day on April 10th, the company launched its latest multimodal large model, SenseNova V6, and the SenseCore 2.0 system. This new version aims to integrate various information formats, including text, images, and videos, to provide users with a more natural and richer interactive experience. The SenseNova V6 series includes four versions, with SenseNova V6Pro being the most notable.

Apr 10, 2025

520

SenseTime's DayDayUp V6 Released: Multimodal AI Upgraded, API Opens Tomorrow!

SenseTime founder Xu Li recently unveiled DayDayUp V6, their latest generation of AI large model, sparking widespread discussion in the tech community. According to AIbase, DayDayUp V6 achieves significant breakthroughs in multimodal capabilities, further solidifying SenseTime's leading position in the AI field. Even more exciting, the model's API will officially open tomorrow, providing developers with stronger technical support and accelerating the implementation of AI applications. Multimodal capabilities are comprehensively upgraded. DayDayUp V6, as SenseTime's...

Apr 10, 2025

210

Google Launches Vertex AI Media Studio Text-to-Video Suite, Revolutionizing Video Creation

On April 9th, 2025, Google officially announced the launch of Vertex AI Media Studio's text-to-video suite. This new platform aims to significantly simplify the video content creation process through artificial intelligence, providing users with a one-stop solution from text to complete video. This news has quickly garnered widespread attention from the tech industry and content creators. Vertex AI Media Studio integrates several of Google's cutting-edge AI models, including Imagen and Veo, to automate the entire video generation process.

Apr 10, 2025

460

OmniSVG: A New Benchmark in Multimodal Vector Graphic Generation from Fudan University and Jieyue Xingchen

Fudan University and Jieyue Xingchen, a leading domestic AI innovation company, recently announced the upcoming release of OmniSVG, an end-to-end multimodal SVG generation model. This news has quickly garnered widespread attention in the technology and design fields. According to AIbase, OmniSVG's core strength lies in its powerful generation capabilities, supporting vector graphic generation from simple icons to complex anime characters, providing a new intelligent solution for digital art creation. The launch of this model is poised to redefine the technical boundaries of vector graphic generation. Multimodal Generation: Flexible response.

Apr 10, 2025

120

Google AI Studio Major Update: New Gemini-2.0-flash-live-001 Officially Launched

Apr 10, 2025

500

Following OpenAI, Google Gemini Joins MCP Initiative to Accelerate AI Agent Interoperability

Just weeks after OpenAI announced its adoption of a rival Anthropic standard for connecting AI models to the systems where their data resides, Google has followed suit. Google DeepMind CEO Demis Hassabis announced on X on Wednesday that Google will add support for the Anthropic Model Context Protocol (MCP) to its Gemini models and software development kits (SDKs).

Apr 10, 2025

200

Google Distributed Cloud, Gemini, and NVIDIA Advance On-premises AI for Enterprises

Apr 10, 2025

130

AI News

AI Daily

AI Timeline

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Google Plans to Combine Gemini and Veo AI Models to Advance Smart Assistants

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Google Gemini Unveils New Circle Screen Feature for Enhanced Search

AI Daily: OpenAI to Potentially Release GPT-4.1 Series Next Week; Pika's New AI Video Feature 'Twists'; SenseTime's 'SenseNova' V6 Makes a Stunning Debut

Report: OpenAI to Release GPT-4.1 Series Next Week, Including Mini and Nano Versions

SenseTime Unveils New Multimodal Large Model, Shaping the Future of Interaction

SenseTime's DayDayUp V6 Released: Multimodal AI Upgraded, API Opens Tomorrow!

Google Launches Vertex AI Media Studio Text-to-Video Suite, Revolutionizing Video Creation

OmniSVG: A New Benchmark in Multimodal Vector Graphic Generation from Fudan University and Jieyue Xingchen

Google AI Studio Major Update: New Gemini-2.0-flash-live-001 Officially Launched

Following OpenAI, Google Gemini Joins MCP Initiative to Accelerate AI Agent Interoperability

Google Distributed Cloud, Gemini, and NVIDIA Advance On-premises AI for Enterprises