Valley 2.0
A multimodal large language model that enhances the ability to process text, image, and video data.
CommonProductOthersMultimodalLarge Language Model
Valley is a multimodal large model (MLLM) developed by ByteDance, designed to handle a variety of tasks involving text, image, and video data. The model has achieved the best results in internal e-commerce and short video benchmarks, significantly outperforming other open-source models, and has demonstrated outstanding performance on the OpenCompass multimodal model evaluation leaderboard, with an average score of 67.40, ranking among the top two known open-source MLLMs (<10B).