HuatuoGPT-o1 is a large language model specifically designed for complex reasoning in healthcare. It can identify errors, explore alternative strategies, and refine answers. The model advances complex reasoning by utilizing verifiable medical questions and specialized medical validators. Key advantages of HuatuoGPT-o1 include guiding the search for complex reasoning trajectories using validators to fine-tune large language models and employing reinforcement learning (PPO) based on validator rewards to further enhance complex reasoning capabilities. The open-source model, data, and code of HuatuoGPT-o1 provide significant value in medical education and research.