Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

✓AI News
AI Tools

Type :

✓AI News
AI Tools

2025-04-25 13:49:40.AIbase

Meta Releases WebSSL Models: A New Exploration in Language-Free Visual Learning

In the field of artificial intelligence, Meta recently introduced the WebSSL family of models. These models, ranging in size from 300 million to 7 billion parameters, are trained on purely image data and aim to explore the vast potential of language-free visual self-supervised learning (SSL). This new research opens up new possibilities for future multimodal tasks and offers a fresh perspective on understanding how visual representations are learned. Previously, OpenAI's CLIP model was known for its performance in multimodal tasks such as visual question answering (VQA) and document understanding.

2025-04-24 13:43:35.AIbase

Jieyue Xingchen and Yuanli Lingji Announce Strategic Partnership

Jieyue Xingchen and Yuanli Lingji have signed a strategic cooperation agreement in Beijing. Both parties will leverage their respective technological advantages to carry out in-depth cooperation in multimodal large model technology, intelligent terminal Agents, and embodied AI scenarios. The goal of this cooperation is to achieve "reasoning in the physical world", jointly developing an intelligent robot named "RoboAgent", and promoting the practical application of Artificial General Intelligence (AGI). At the signing ceremony, Dr. Jiang Daxin, founder and CEO of Jieyue Xingchen, and the co-founders of Yuanli Lingji...

2025-04-24 10:31:31.AIbase

Kunlun Wanwei Open-Sources Skywork-R1V 2.0 Version with Enhanced Visual and Text Reasoning Capabilities

On April 24th, Kunlun Wanwei announced the official open-sourcing of its multimodal reasoning model, Skywork-R1V2.0 (hereinafter referred to as R1V2.0). This upgraded version demonstrates significant improvements in both visual and text reasoning capabilities, particularly excelling in deep reasoning for challenging science problems in the College Entrance Examination and general task scenarios. It is considered the currently most balanced open-source multimodal model, equally adept at visual and text reasoning.

2025-04-23 16:51:53.AIbase

ByteDance Launches Vidi, a Multimodal Model Leading the Trend in Ultra-Long Video Understanding and Editing

2025-04-23 16:22:12.AIbase

xAI Launches Grok Vision: A New Chapter in Visual and Multilingual Intelligent Interaction

2025-04-23 08:54:21.AIbase

Grok Major Update: Enhanced Visual Capabilities, Multi-lingual Audio Processing, and Real-time Search!

2025-04-21 11:01:03.AIbase

Kunlun Wanwei Open-Sources SkyReels-V2: An Infinite-Length Movie Generation Model

Kunlun Wanwei's SkyReels team has officially released and open-sourced SkyReels-V2, the world's first infinite-length movie generation model using a diffusion-forcing framework. This model achieves synergistic optimization by combining a multimodal large language model (MLLM), multi-stage pre-training, reinforcement learning, and a diffusion-forcing framework, marking a new stage in video generation technology.

2025-04-18 10:04:37.AIbase

Interview Kickstart Launches Generative AI Course to Empower Tech Professionals for Future Opportunities

In the rapidly evolving landscape of Artificial Intelligence (AI), specialized knowledge for technology professionals is becoming increasingly crucial. Interview Kickstart, based in Santa Clara, California, recently announced an update to its Generative AI course, designed to equip tech professionals to navigate this rapidly changing market. This news coincides with the significant attention generated by Chinese tech giant Baidu's launch of its next-generation AI models – Ernie4.5 and Ernie X1. Baidu's multimodal foundation models...

2025-04-18 08:48:47.AIbase

ByteDance Releases UI-TARS-1.5: Open-Source Multimodal Agent Leading a New Wave in GUI Automation

ByteDance has officially released UI-TARS-1.5 on the Hugging Face platform, an open-source multimodal agent built upon a powerful vision-language model. This release marks another significant breakthrough for ByteDance in the field of AI automated interaction, providing developers and users with a highly efficient and intelligent cross-platform GUI (Graphical User Interface) automation solution. UI-TARS-1.5: A New Benchmark for Multimodal Agents. UI-TARS-1.5 is the latest in ByteDance's UI-TARS series...

2025-04-18 08:01:41.AIbase

ByteDance Doubao Open-Source Seed Agent Model UI-TARS-1.5

The ByteDance Doubao large model team announced the open-sourcing of UI-TARS-1.5, an open-source multimodal agent built on a vision-language model capable of efficiently executing various tasks in a virtual world. The model achieved state-of-the-art (SOTA) performance on seven typical GUI (Graphical User Interface) benchmark evaluations and demonstrated, for the first time, its long-term reasoning capabilities in games and interactive capabilities in open spaces. This open-source project marks a significant advancement in multimodal agent technology for GUIs.

2025-04-17 13:56:14.AIbase

Shanghai AI Laboratory Unveils Upgraded Multimodal Large Model, 'Shusheng · Wanxiang 3.0'

2025-04-17 11:13:14.AIbase

ByteDance Releases Doubao 1.5 Deep Thinking Model: Multimodal Deep Thinking, Low Latency

2025-04-17 08:51:24.AIbase

OpenAI Unveils Novel Reasoning Model o3 with Image Reasoning Capabilities

OpenAI recently released its latest reasoning models, o3 and o4-mini, marking a significant breakthrough in the field of artificial intelligence. These models not only surpass previous versions in reasoning capabilities but also achieve image reasoning for the first time, integrating visual information directly into the thinking process. o3, hailed as a "genius-level" model, particularly excels in tasks such as programming and mathematics, achieving an accuracy rate of 87.5%. The newly released o3 and o4-mini models demonstrate exceptional performance in multimodal processing, possessing...

2025-04-17 08:37:14.AIbase

Open-Source Wanjuan Silk Road 2.0 Multilingual Multimodal Dataset from Shanghai AI Laboratory

The Shanghai Artificial Intelligence Laboratory has released the open-source "Wanjuan Silk Road 2.0" multilingual multimodal corpus. Building upon the existing 5 languages (Arabic, Russian, Korean, Vietnamese, and Thai), this updated corpus adds three rare languages: Serbian, Hungarian, and Czech. It encompasses four modalities – text, images, audio, and video – totaling over 11.5 million data points and more than 26,000 hours of audio and video, making it a significant resource for low-resource multilingual multimodal research.

2025-04-17 07:49:21.AIbase

OpenAI Unveils Two Multimodal Reasoning Models: o4-mini and Full-Power o3

During a technical livestream at 1 AM today, OpenAI officially launched its latest and most powerful multimodal models: o4-mini and the full-power o3. These models offer unique advantages, capable of processing text, images, and audio simultaneously. They also function as agents, automatically utilizing tools such as web search, image generation, and code parsing. Furthermore, they possess a deep thinking mode, enabling reasoning about images within a chain of thought.

2025-04-16 17:03:19.AIbase

ByteDance Open-Sources Liquid, a Multimodal Model Revolutionizing Unified Visual and Language Generation

A significant breakthrough in the field of artificial intelligence. AIbase learned from social media that ByteDance recently announced the open-sourcing of its new multimodal generation model, Liquid. This model, utilizing an innovative unified encoding method and a single large language model (LLM) architecture, seamlessly integrates visual understanding and generation tasks. This release not only showcases ByteDance's technological ambition in multimodal AI but also provides a powerful open-source tool for global developers. Below is AIbase's in-depth analysis of the Liquid model, exploring its technological innovations and core features.

2025-04-16 16:30:27.AIbase

Apple and Sorbonne University Joint Research: Early Fusion and Sparse Architectures Advance Multimodal AI

In the field of multimodal artificial intelligence (AI), engineers from Apple have collaborated with a research team from Sorbonne University in France on a significant study. Recently, tech media outlet marktechpost published a blog post discussing the application and prospects of early and late fusion models in multimodal AI. The research indicates that early fusion models trained from scratch offer superior computational efficiency and scalability. Multimodal AI aims to process multiple data types simultaneously, such as images and text; however, integrating these diverse sources presents challenges.

2025-04-16 10:51:03.AIbase

National Supercomputing Platform Releases New Generation Multimodal Large Model to Advance AI Agent Development

2025-04-16 09:12:01.AIbase

Cohere Launches Embed 4: A New Multimodal Search Model Handling 200-Page Documents

2025-04-15 10:19:56.AIbase

MiniMax MCP Server Officially Launches, Ushering in a New Era of Multimodal AI

The boundaries of artificial intelligence technology are constantly expanding. AIbase learned from social media that MiniMax, a Chinese AI startup, recently announced the official launch of its MiniMax MCP Server. This server allows users to access various capabilities, including video generation, image generation, voice generation, and voice cloning, simply through text input. It's compatible with multiple mainstream MCP clients, providing developers and creators with a powerful multimodal AI tool. Below is AIbase's in-depth analysis of this significant release.

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

Meta Releases WebSSL Models: A New Exploration in Language-Free Visual Learning

Jieyue Xingchen and Yuanli Lingji Announce Strategic Partnership

Kunlun Wanwei Open-Sources Skywork-R1V 2.0 Version with Enhanced Visual and Text Reasoning Capabilities

ByteDance Launches Vidi, a Multimodal Model Leading the Trend in Ultra-Long Video Understanding and Editing

xAI Launches Grok Vision: A New Chapter in Visual and Multilingual Intelligent Interaction

Grok Major Update: Enhanced Visual Capabilities, Multi-lingual Audio Processing, and Real-time Search!

Kunlun Wanwei Open-Sources SkyReels-V2: An Infinite-Length Movie Generation Model

Interview Kickstart Launches Generative AI Course to Empower Tech Professionals for Future Opportunities

ByteDance Releases UI-TARS-1.5: Open-Source Multimodal Agent Leading a New Wave in GUI Automation

ByteDance Doubao Open-Source Seed Agent Model UI-TARS-1.5

Shanghai AI Laboratory Unveils Upgraded Multimodal Large Model, 'Shusheng · Wanxiang 3.0'

ByteDance Releases Doubao 1.5 Deep Thinking Model: Multimodal Deep Thinking, Low Latency

OpenAI Unveils Novel Reasoning Model o3 with Image Reasoning Capabilities

Open-Source Wanjuan Silk Road 2.0 Multilingual Multimodal Dataset from Shanghai AI Laboratory

OpenAI Unveils Two Multimodal Reasoning Models: o4-mini and Full-Power o3

ByteDance Open-Sources Liquid, a Multimodal Model Revolutionizing Unified Visual and Language Generation

Apple and Sorbonne University Joint Research: Early Fusion and Sparse Architectures Advance Multimodal AI

National Supercomputing Platform Releases New Generation Multimodal Large Model to Advance AI Agent Development

Cohere Launches Embed 4: A New Multimodal Search Model Handling 200-Page Documents

MiniMax MCP Server Officially Launches, Ushering in a New Era of Multimodal AI