AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation MCP

Byte's Large Model New Progress: First Time Introducing Visual Localization for Fine-Grained Multimodal Joint Understanding, Now Open Source & Demo Available

新智元

Published inAI News · 1 min read · Aug 15, 2023

The article introduces the BuboGPT model launched by ByteDance, which supports multi-modal joint understanding of text, images, and audio, and for the first time incorporates visual localization technology to accurately locate objects within images. Researchers have achieved good results in multi-modal tasks by adopting a training scheme that involves multi-modal instruction tuning. The model has been open-sourced and a playable demo page is provided.

ByteDance BuboGPT Multimodal

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

A Daily: Bilibili Upgrades Anime Video Generation Model AniSora V3; ByteDance Open Sources 4D Video Generation Framework EX-4D; DeepSWE Open Sources AI Agent System Rises to the Top

Jul 3, 2025

130

ByteDance Open Sources New Model VINCIE-3B: 300 Million Parameters Support Continuous Image Editing with Context

Jul 3, 2025

520

ByteDance EX-4D Shakes Open Source: Turn Monocular Video into Free Perspective 4D Movie

Jul 3, 2025

260

Zhipu AI Launches GLM-4.1V-Thinking Open Source! A New Leader in Multimodal Reasoning, Challenging Top Models Worldwide

Jul 2, 2025

600

Zhipu AI Open Sources GLM-4.1V-Thinking: A Breakthrough in Multimodal Reasoning

Zhipu AI officially open-sources its latest general vision model, GLM-4.1V-Thinking, based on the GLM-4V architecture, which introduces a chain-of-thought reasoning mechanism, significantly enhancing its capabilities for complex cognitive tasks. The model supports multimodal inputs such as images, videos, and documents, and excels in diverse scenarios including long video understanding, image question answering, subject problem-solving, text recognition, document interpretation, grounding, GUI Agent, and code generation, covering a wide range of industry application needs. GLM-4.1V-9B-Thinking

Jul 2, 2025

570

AI Daily: Alibaba Tongyi Launches Qwen-TTS Model; Cursor Now Supports Web and Mobile; ByteDance Unveils Image Synthesis Technology XVerse

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Discover new AI products: https://top.aibase.com/1. Qwen-TTS Launches with a Major Breakthrough in Dialect Speech Synthesis, Achieving Realism Close to Human Voices. The Qwen-TTS model, developed by Alibaba's Tongyi team, has made significant breakthroughs in the field of speech synthesis.

Jul 1, 2025

370

ByteDance Releases Innovative Image Synthesis Technology XVerse: Independent and Precise Control over Multiple Individuals

On June 26, 2025, ByteDance officially launched its latest image synthesis technology - XVerse, aimed at providing a high-precision multi-subject image generation solution. This innovative technology enables users to independently and precisely control multiple individuals, greatly enhancing the ability to generate personalized and complex scenes. The core of XVerse lies in its unique DiT modulation method, which allows control over the identity and semantic attributes of each subject without affecting the overall latent features of the image. By converting reference images into specific characteristics...

Jul 1, 2025

390

AI Daily Report - June 30th: Baidu Open Sources the WENXIN Large Model 4.5 Series; Tongyi Qianwen Multimodal Generation Model Qwen VLo

Welcome to the AIbase [AI Daily Report] section! Spend three minutes a day to learn about the latest AI events, helping you understand AI industry trends and innovative AI product applications. For more AI news, visit: https://www.aibase.com/zh1. Baidu officially releases the WENXIN Large Model 4.5 series and fully opens it to the public, featuring ten new models with various parameter configurations. These models are trained and inferred using the PaddlePaddle framework, achieving a FLOPs utilization rate of 47%, and perform well in multi-modal text tasks.

Jun 30, 2025

450

Breaking News! GPT-5 is About to Arrive, Take You into a New Multimodal AI Era!

Recently, news about OpenAI's upcoming release of GPT-5 has attracted widespread attention in the technology industry. According to insiders, GPT-5 has already started a gradual test and is expected to be officially launched in July this year. This new model will adopt a multimodal design, meaning it can not only process text input but also understand speech, images, code, and even videos, completely changing the way we interact with AI. Sam Altman, CEO of OpenAI, stated that the launch of GPT-5 will mark a new era in AI.

Jun 30, 2025

660

Tongyi Qianwen Launches the Multimodal Unified Understanding and Generation Model Qwen VLo

Recently, the Qwen VLo multimodal large model was officially released, achieving significant advancements in image content understanding and generation, offering users a brand-new visual creation experience. According to the introduction, Qwen VLo has been comprehensively upgraded based on the advantages of the original Qwen-VL series models. The model not only can accurately understand the "world", but also can perform high-quality re-creation based on understanding, truly achieving the transition from perception to generation. Users can now access Qwen Chat (chat.qwen.ai)

Jun 28, 2025

1.8k