Alibaba Damo Academy Launches E-commerce Multi-modal Large Model Valley 2

AIbase基地

Published inAI News · 3 min read · Jan 15, 2025

315

Alibaba's Damo Academy recently launched a multimodal large language model called Valley2. This model is designed based on e-commerce scenarios, aiming to enhance performance across various fields and expand the application boundaries of e-commerce and short video scenarios through a scalable visual-language architecture. Valley2 uses Qwen2.5 as its LLM backbone, paired with the SigLIP-384 visual encoder, and combines MLP layers and convolution for efficient feature transformation. Its innovation lies in the introduction of a large visual vocabulary, convolutional adapters (ConvAdapter), and the Eagle module, which enhances flexibility in processing diverse real-world inputs and improves training and inference efficiency.

WeChat Screenshot_20250115084005.png

The data for Valley2 consists of OneVision-style data, data tailored for e-commerce and short video domains, and chain-of-thought (CoT) data for solving complex problems. The training process is divided into four stages: text-visual alignment, high-quality knowledge learning, instruction fine-tuning, and post-training with chain-of-thought. In experiments, Valley2 performed excellently on multiple public benchmarks, scoring particularly high on benchmarks such as MMBench, MMStar, and MathVista, and also surpassed other models of similar scale in the Ecom-VQA benchmark test.

In the future, Alibaba's Damo Academy plans to release a versatile model that includes text, image, video, and audio modalities, and introduce a multimodal embedding training method based on Valley to support downstream retrieval and detection applications.

The launch of Valley2 marks a significant advancement in the field of multimodal large language models, demonstrating the potential to enhance model performance through structural improvements, dataset construction, and optimization of training strategies.

Model link:

https://www.modelscope.cn/models/bytedance-research/Valley-Eagle-7B

Code link:

https://github.com/bytedance/Valley

Paper link:

https://arxiv.org/abs/2501.05901

Valley2 Multi-modalLargeLanguageModel Qwen2.5 CommercialScene

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Qwen Chat Desktop Client Released, Supporting One-Click Activation and Invocation of MCP

Recently, Qwen Chat received a major update and made a new appearance, offering users a more intuitive interaction experience and a wider range of functional services, aiming to become the most reliable AI partner for everyone. The updated Qwen Chat has achieved significant improvements in interaction design, allowing users to start a conversation directly on the home page without complicated operations, making chatting more convenient. Its functions have also been significantly expanded, supporting daily questions, meeting users' various information query needs, and assisting in content creation, whether it's writing articles or generating text.

Jul 12, 2025

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

Moonshot AI officially released its latest creation - the Kimi K2 model, and simultaneously announced its open source. This foundation model based on the MoE architecture has gained widespread attention in the AI field since its release, thanks to its strong coding capabilities and excellent general Agent task processing abilities. The Kimi K2 model has a total of 1T parameters, with 32B activated parameters. It has achieved top performance among open-source models in a series of benchmark performance tests such as SWE Bench Verified, Tau2, and AceBench.

Jul 12, 2025

AI Daily: Tencent Huyaun Launches 3D Generation Large Model Hunyuan3D-PolyGen; DingTalk AI Spreadsheet Makes a Big Entry; Alibaba Launches Multimodal Large Language Model HumanOmniV2

1.Tencent's Hunyuan3D-PolyGen boosts 3D modeling efficiency by 70% with BPT tech. 2.Alibaba's HumanOmniV2 achieves 69.33% accuracy in multilingual input. 3.DingTalk AI processes 1k tasks/hour with 'spreadsheet-as-document'. 4.Baidu PaddleOCR3.1 improves 37-language recognition by 30%. 5.Microsoft Deep Research opens API. 6.HKPolyU & OPPO's DLoRAL speeds video enhancement 10x. 7.Google opens MCP Toolbox for SQL. 8.Microsoft Win11 to add AI dynamic....

Jul 8, 2025

1.1k

Ali HumanOmniV2 Launches with a Shock: The New King of Multimodal AI, Accuracy Surges to 69.33%

Jul 8, 2025

1.5k

AI Daily: Bilibili May Launch an AI Creation Tool Named H; Zhiyuan Unveils Naoche Robot Lingxi X2-N; Yushu Technology Pursues IPO on Sci-Tech Innovation Board

AI Daily: B站 launches 'H' tool for video creation; Zhiyuan unveils dual-mode robot X2-N; Yushu Tech aims for IPO at $12B valuation; EarthMind innovates earth data analysis; Gemini CLI updates AV features; macOS assistant Glass goes open-source; Claude to release math-focused Neptune v3; OpenAI's GPT-5 to integrate multi-models.....

Jul 7, 2025

1.0k

Strongest Worker! Indian Guy Earns Five Silicon Valley AI Salaries with One Resume

Indian man Soham Parekh scammed Silicon Valley startups by working 5 remote jobs simultaneously with fake resumes. His AI interview tool 'CheatingDaddy' sparked debate. The incident exposed workplace fraud and 'overemployment' trends, reflecting modern workers' financial pressures.....

Jul 7, 2025

1.2k

Zhixuan Launches Naocha Robot Lingxi X2-N: Can Switch Between Wheel and Foot Dual Modes

Ziyuan's robot Lingxi X2-N features dual-mode design: wheeled for mobility and legged for obstacle-crossing, carrying 6kg. It adapts to complex terrains with excellent balance and load capacity.....

Jul 7, 2025

960

Kunlun Xiwang Once Again Open-Sources the Reward Model Skywork-Reward-V2

On July 4, 2025, Kunlun Xiwang continued to open-source the second-generation reward model Skywork-Reward-V2 series. This series includes 8 reward models based on different foundation models, with parameter sizes ranging from 600 million to 8 billion. Upon its release, it won all seven major reward model evaluation rankings, becoming a focus in the open-source reward model field. Reward models play a key role in the reinforcement learning from human feedback (RLHF) process. To build the next generation of reward models, Kunlun Xiwang has constructed a dataset containing 40 million

Jul 4, 2025

310

Topview Avatar 2 Shakes the Market! AI Digital Humans Revolution E-commerce Live Streaming, Will the Era of Models Come to an End?

Jul 3, 2025

580

Exploring the Compatibility of LLMs with Reinforcement Learning: Shanghai Jiao Tong University Reveals Differences Between Llama and Qwen, Introducing OctoThinker

Large Language Models (LLMs) have achieved significant progress in complex reasoning tasks by combining task prompts with large-scale reinforcement learning (RL), as demonstrated by models like Deepseek-R1-Zero, which directly apply reinforcement learning to base models, showcasing strong reasoning capabilities. However, this success is difficult to replicate across different base model families, especially within the Llama series. This raises a core question: what factors lead to inconsistent performance of different base models during reinforcement learning? How does reinforcement learning perform in

Jul 3, 2025

210

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Alibaba Damo Academy Launches E-commerce Multi-modal Large Model Valley 2

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Qwen Chat Desktop Client Released, Supporting One-Click Activation and Invocation of MCP

Moonshot AI Releases and Opensources Kimi K2 Model, Strong in Code and Agentic Tasks

AI Daily: Tencent Huyaun Launches 3D Generation Large Model Hunyuan3D-PolyGen; DingTalk AI Spreadsheet Makes a Big Entry; Alibaba Launches Multimodal Large Language Model HumanOmniV2

Ali HumanOmniV2 Launches with a Shock: The New King of Multimodal AI, Accuracy Surges to 69.33%

AI Daily: Bilibili May Launch an AI Creation Tool Named H; Zhiyuan Unveils Naoche Robot Lingxi X2-N; Yushu Technology Pursues IPO on Sci-Tech Innovation Board

Strongest Worker! Indian Guy Earns Five Silicon Valley AI Salaries with One Resume

Zhixuan Launches Naocha Robot Lingxi X2-N: Can Switch Between Wheel and Foot Dual Modes

Kunlun Xiwang Once Again Open-Sources the Reward Model Skywork-Reward-V2

Topview Avatar 2 Shakes the Market! AI Digital Humans Revolution E-commerce Live Streaming, Will the Era of Models Come to an End?

Exploring the Compatibility of LLMs with Reinforcement Learning: Shanghai Jiao Tong University Reveals Differences Between Llama and Qwen, Introducing OctoThinker