Valley is a cutting-edge multimodal large model developed by ByteDance, capable of handling a variety of tasks involving text, image, and video data. The model achieved top results in internal e-commerce and short video benchmarking, outperforming other open-source models. In OpenCompass testing, it scored an average of 67.40 or higher, ranking second among models under 10 billion parameters. The Valley-Eagle version references Eagle and introduces a vision encoder that can flexibly adjust the number of tokens while operating in parallel with the original visual tokens, enhancing the model's performance in extreme scenarios.