HIT Deep Releases Multi-modal Large Model Jiutian, Performance Improved by 5%

站长之家

Published inAI News · 1 min read · Dec 4, 2023

Harbin Institute of Technology (Shenzhen) has released JiuTian, a multimodal large model, which achieved a 5% performance improvement across 13 vision-language tasks. JiuTian addresses the shortcomings of traditional models in extracting visual information by integrating spatial awareness and semantic visual knowledge. The new method framework includes segmented instruction fine-tuning strategies and hybrid adapters, effectively enhancing visual understanding capabilities. Paper link: https://arxiv.org/abs/2311.11860, GitHub: https://github.com/rshaojimmy/JiuTian.

Multi-modal Large Language Model Harbin Institute of Technology Jiutian

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Amazon Developing New Multi-Modal Language Model 'Olympus' Expected to Debut at AWS Conference

Amazon has been reported to be developing a multi-modal large language model named 'Olympus', which is expected to be officially announced at the AWS re:Invent conference next week. According to The Information, the internal code name for this algorithm is 'Olympus'. Last November, Reuters reported that Amazon had invested millions of dollars in training a large language model named 'Olympus', with a parameter count reaching 2 trillion.

Nov 29, 2024

2.1k

Tsinghua University and Tencent Jointly Launch Fully Open Source Multi-Modal Architecture Oryx Supporting Ultra-Long Video Input

In today's rapidly advancing field of artificial intelligence, a multi-modal large language model named ORYX is quietly transforming our understanding of AI's ability to perceive the visual world. This AI system, developed collaboratively by researchers from Tsinghua University, Tencent, and Nanyang Technological University, is regarded as the 'Transformers' of visual processing. ORYX, short for Oryx Multi-Modal Large Language Models, is an AI model specifically designed for processing images, videos, and 3D scene time-space understanding.

Sep 29, 2024

2.4k

Wuhan University Collaborates with China Mobile and Jiutian AI Team to Release Open-source Audio-Video Speaker Recognition Dataset VoxBlink2

Wuhan University, in collaboration with China Mobile's Jiutian AI team and Duke Kunshan University, has released the open-source audio-video speaker recognition dataset VoxBlink2, which is based on YouTube data and contains over 110,000 hours of audio-video recordings. The dataset includes 9,904,382 high-quality audio clips and their corresponding video segments, sourced from 111,284 users on YouTube, making it the largest publicly available audio-video speaker recognition dataset to date. The release of this dataset aims to enrich open-source speech corpora and support the training of voiceprint large models.

Jul 26, 2024

3.5k

Tsinghua and Peking University Collaborate to Release LVBench: A Long Video Understanding Benchmark

Recently, Zhihua, Tsinghua University, and Peking University collaborated to launch a long video understanding benchmark project called LVBench. While existing multimodal large language models have made significant progress in understanding short videos, they still face challenges when dealing with long videos that can last for hours. LVBench was created to fill this gap.

Jun 17, 2024

2.1k

Tsinghua University and Harbin Institute of Technology Jointly Propose OneBit Method, Compressing Large Models to 1bit While Retaining 83% Performance

Tsinghua University and Harbin Institute of Technology have jointly published a paper that compresses large models to 1bit while maintaining 83% performance. The OneBit method breaks the 2bit limitation and is the first attempt at 1bit quantization, garnering widespread attention. The new method combines a 1bit layer structure, SVID parameter initialization, and quantization-aware training. It overcomes the hurdles of 2bit quantization, providing a possibility for efficiently running large models on mobile devices.

Mar 4, 2024

1.1k

Kuaishou Collaborates with Harbin Institute of Technology to Open Source KwaiAgents System, Outperforming GPT-3.5

Kuaishou, in collaboration with Harbin Institute of Technology, has successfully open-sourced the 'KwaiAgents' system, achieving performance beyond 7B/13B models. The project includes a lightweight AI Agents system, large models (KAgentLMs), and an automated evaluation benchmark (KAgentBench). By utilizing the Meta-Agent Tuning (MAT) method, the model incorporates additional Agent Prompt templates to enhance the general capabilities of the large models. The system is based on large models for cognitive understanding.

Dec 28, 2023

1.2k

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview