MoE-LLaVA is an expert mixture model based on large-scale vision-language models, demonstrating excellent performance in multi-modal learning. It has fewer parameters but exhibits high performance and can be trained in a short time. The model supports Gradio Web UI and CLI inference, and provides functions such as model library, requirements and installation, training and validation, customization, visualization, and API.