M2RAG is a benchmark codebase for retrieval-augmented generation in multimodal contexts. It answers questions by retrieving multimodal documents, evaluating the ability of multimodal large language models (MLLMs) to leverage knowledge from multimodal contexts. The model is evaluated on tasks such as image captioning, multimodal question answering, fact verification, and image re-ranking, aiming to improve the effectiveness of models in multimodal contextual learning. M2RAG provides researchers with a standardized testing platform to help advance the development of multimodal language models.