As GPT-4 was released, multi-modal large models (MLLM) became a hot topic. The team led by Ma Yi proposed the EMT framework to evaluate catastrophic forgetting in MLLMs after fine-tuning. Experiments revealed that while fine-tuning MLLMs improved performance on the fine-tuning dataset, it also led to a decline in performance on other datasets. During the fine-tuning process, MLLMs generated hallucination texts related to the fine-tuning dataset, overlooking the original issues. This research provides a framework and benchmarks for subsequent work, and further optimization is needed in model design and training techniques. The Ma Yi team conducted the first systematic evaluation of catastrophic forgetting in MLLMs, balancing trade-offs between different capabilities.