This article introduces LLaVA 1.5, a multimodal language model currently under development in the open-source community. It integrates multiple generative AI components, achieves high computational efficiency post-tuning, and can attain high accuracy across various tasks. LLaVA 1.5 employs CLIP as its visual encoder and utilizes the open-source LLaMA language model, connected through an MLP connector. It can outperform other open-source models in multimodal benchmarks with just approximately 600,000 training samples and one day of training time. Despite its usage limitations, LLaVA 1.5 represents the innovative direction of the open-source community and holds the potential to drive the development of open-source large models, offering users more convenient and efficient generative AI tools.