The Alibaba International AI team recently released a new inference model called Marco-o1, which focuses specifically on solving open-ended questions, not limited to subjects with standard answers, such as programming and mathematics. The research team is dedicated to exploring whether such models can be effectively applied to areas that are difficult to quantify and lack clear rewards.
The features of the Marco-o1 model include fine-tuning with ultra-long CoT data, utilizing MCTS to expand the solution space, and fine-grained solution space expansion. The model constructs a set of ultra-long CoT data with reflection and correction capabilities through self-play + MCTS, and it is trained alongside other open-source data. Additionally, the research team has defined mini-Steps to further expand the model's solution space, guiding the model to output better answers.
In translation tasks, the Marco-o1 model demonstrated its ability to handle the translation of long and complex sentences, marking the first time that inference expansion has been applied to machine translation tasks. The research team has open-sourced some CoT data and the best current model, with plans to release more data and models in the future.
During inference, the model deeply analyzes the response. For example, when outputting the number of 'r's in the word 'strawberry', the model gradually breaks down each letter in the word and compares them, ultimately producing the correct result. In the field of machine translation, the model correctly identifies challenges through reasoning pathways, translating word by word, which enhances overall translation accuracy.
The research team has also attempted applications in other areas, demonstrating that the model possesses the ability to solve other general real-world problems. The overall structure of Marco-o1 is built using self-play + MCTS to create a set of ultra-long CoT data with reflection and correction capabilities, trained alongside other open-source data. The research team has also integrated some instruction-following datasets from the MarcoPolo family, improving the model's instruction-following abilities.
Regarding usage, the research team provides inference and fine-tuning code, allowing users to easily load the model and tokenizer and start chatting or fine-tuning the model. Additionally, the model can also be run directly in the GGUF version on ModelScope, offering a quicker experience.
The release of the Marco-o1 model marks an important step for the Alibaba International AI team in the field of inference models, providing new ideas and tools for solving open-ended problems.
ModelScope:
https://modelscope.cn/models/AIDC-AI/Marco-o1
Arxiv:
https://arxiv.org/abs/2411.14405
Github:
https://github.com/AIDC-AI/Marco-o1
Hugging Face:
https://huggingface.co/AIDC-AI/Marco-o1