AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

ByteDance Launches PixelLM Multi-modal Large Model: Efficient Pixel-level Inference, Breaking Free from SAM Limitations

站长之家

Published inAI News · 1 min read · Dec 28, 2023

140

Translated data: ByteDance's multimodal large model, PixelLM, introduces efficient pixel-level reasoning without relying on SAM. The advantage of this model lies in its ability to handle diverse and complex reasoning segmentation tasks, providing multiple sets of actual segmentation results, enabling it to effectively address open-domain issues. This marks a step forward for multimodal large models into fine-grained tasks such as image editing, autonomous driving, and robotics.

Multi-modal Large Model Pixel-level Inference PixelLM

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Alibaba Damo Academy Launches E-commerce Multi-modal Large Model Valley 2

Recently, Alibaba Damo Academy launched a multi-modal large language model named Valley 2, designed for e-commerce scenarios. It aims to enhance performance across various fields and expand the application boundaries of e-commerce and short video scenarios through a scalable vision-language architecture. Valley 2 utilizes Qwen 2.5 as its LLM backbone, paired with the SigLIP-384 visual encoder, incorporating MLP layers and convolution for efficient feature transformation.

Jan 15, 2025

2.6k

Alibaba Launches Multi-Modal Large Model mPLUG-Owl3: Watch a 2-Hour Movie in 4 Seconds

The latest release from the Alibaba team, mPLUG-Owl3 is a general-purpose multi-modal large model, with its core capability being the understanding of long image sequences. By introducing a hyper attention module, mPLUG-Owl3 can efficiently process visual and language information, achieving in-depth understanding and communication of multi-modal data such as images and videos. This model has made significant breakthroughs in inference efficiency, image processing capabilities, and the application of multi-modal knowledge, particularly in video understanding, where it can 'watch' a 2-hour movie in 4 seconds and accurately answer related questions.

Aug 19, 2024

4.1k

The Compass Arena, a Large Model Evaluation Platform, Adds a Multi-Modal Large Model Competition Section

The Sinan OpenCompass team at the Shanghai Artificial Intelligence Laboratory has collaborated with the Modao ModelScope to launch the Compass Multi-Modal Arena, a new section of a large model evaluation platform focusing on multi-modal large models. Users can upload images and input questions to have two anonymous multi-modal large models generate answers, which can then be subjectively evaluated based on the quality of the generated content, allowing users to select the better-performing model. The platform offers an easy-to-use interface and a unique question bank.

Aug 13, 2024

2.4k

NetEase Fuxi Unveils Robot Brand LingDong and Multimodal Large Model Eternal Forms

The "LingDong" brand has been meticulously crafted by NetEase Vuex, relying on its independently developed industrial-grade large models and AOP (Artificial Intelligence Operation Perception) technology concept. Two flagship products under the brand, the mining robot and the loading robot, have been put into use in over 50 provincial key projects, serving diverse environments such as mines, ports, mixing stations, and schools. In addition to the robot brand launch, NetEase Vuex also showcased it

Jul 4, 2024

1.2k

Huawei Pangu Big Model 5.0 Released: Upgraded Multimodal Capabilities and Enhanced Cognitive Abilities

Today, at the Huawei HDC2024 Developer Conference, Huawei Executive Director and Huawei Cloud CEO Zhang Pingan announced the official release of Huawei Cloud's Pangu Big Model 5.0 to the world. The Pangu Big Model 5.0, with its parameters ranging from hundreds of millions to trillions, along with its diverse functions including Pangu Natural Language Model, Multimodal Model, Visual Model, Predictive Model, and Scientific Computing Model, provides unprecedented robust support for AI applications.

Jun 21, 2024

2.9k

Huazhong University of Science and Technology Releases New Benchmark for Multi-modal Large Model Performance Evaluation

Huazhong University of Science and Technology and other institutions have released a new benchmark for multi-modal large models, covering five major tasks and 27 datasets. Focusing on the field of Optical Character Recognition (OCR), they proposed the evaluation benchmark OCRBench, revealing the limitations of models. The evaluation results show excellent performance in tasks such as text recognition and document question answering, but challenges remain in semantic dependency, handwritten text, and multilingual text. The research team established OCRBench to more accurately assess OCR capabilities and provide guidance for the development of multi-modal large models.

Feb 2, 2024

680

Multi-modal Large Model Integrated Detection and Segmentation Module Makes Cutout Easier

With the integration of the detection and segmentation module, the multi-modal large model makes cutout easier! The model can specify the objects to find through natural language descriptions and provide textual explanations. This new model was developed by the NExT++ laboratory of the National University of Singapore and the Liu Zhiyuan team from Tsinghua University.

Jan 4, 2024

650

HIT Deep Releases Multi-modal Large Model Jiutian, Performance Improved by 5%

HIT Deep has released Jiutian, a multi-modal large language model that achieves state-of-the-art performance across 13 visual language tasks. Jiutian improves by 5% on the Visual Spatial Reasoning task by integrating fine-grained spatial awareness and high-level semantic visual knowledge. The new method framework addresses the inadequacy of visual information extraction through segmented instruction fine-tuning strategies and mixed adaptors. The Jiutian model integrates fine-grained spatial awareness and high-level semantic visual knowledge to resolve visual localization biases and hallucination issues.

Dec 4, 2023

710

Stanford Launches Universal Corrector LURE to Address Multi-Modal Object Hallucination Issues

Researchers from the University of North Carolina at Chapel Hill and Stanford University have collaboratively developed the universal corrector LURE to address object hallucination issues in multi-modal large models. LURE reduces hallucination problems by statistically analyzing key factors such as object co-occurrence, uncertainty, and object position. Evaluation across multiple multi-modal large models demonstrated an improvement of over 23% in the universal object hallucination assessment metrics, indicating its effectiveness. LURE is expected to have a positive impact on artificial intelligence applications by providing more accurate outputs. This multi-modal hallucination mitigation solution provides analytical and corrective measures for key factors.

Nov 6, 2023

720

Tsinghua & Zhizhu AI Launch Multi-Modal Large Model CogVLM

Tsinghua KEG & Zhizhu AI have released the multi-modal large model CogVLM, achieving deep integration of visual and language features. CogVLM-17B demonstrates outstanding performance across multiple datasets, achieving state-of-the-art or second-place results. The model architecture includes a ViT encoder, MLP adapter, pre-trained large language model, and a visual expert module. CogVLM has been pre-trained on 1.5 billion image-text pairs and has achieved satisfactory results on multi-modal benchmarks.

Oct 12, 2023

7.0k