In recent years, significant progress has been made in artificial intelligence technology, but challenges still exist between computational efficiency and multi-functionality. Many advanced multi-modal models, such as GPT-4, typically require substantial computational resources, limiting their use on high-end servers and making it difficult for intelligent technologies to be effectively utilized on edge devices like smartphones and tablets. Furthermore, real-time processing of tasks such as video analysis or speech-to-text still faces technical barriers, highlighting the need for efficient and flexible AI models to achieve seamless performance under limited hardware conditions.