NVLM-D-72B is a multimodal large language model launched by NVIDIA, focusing on vision-language tasks, and enhancing text performance through multimodal training. The model has achieved results comparable to industry-leading models in vision-language benchmark tests.