EurusPRM-Stage2 is a cutting-edge reinforcement learning model that optimizes the reasoning process of generative models using implicit process rewards. It calculates process rewards through the log-likelihood ratios of causal language models, improving the reasoning capabilities of the models without incurring additional annotation costs. Its primary advantage lies in its ability to learn process rewards implicitly using only response-level labels, thereby increasing the accuracy and reliability of generative models. The model excels in tasks such as mathematical problem solving, making it suitable for scenarios requiring complex reasoning and decision-making.