AI Ranking

AI Ranking

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

AI News
AI Tools
AI Cases
AI Tutorial

Type :

AI News
AI Tools
AI Cases
AI Tutorial

2025-01-16 15:46:26.AIbase

Alibaba Cloud Launches New Mathematical Reasoning Model Qwen2.5-Math-PRM, 7B Version Surpasses GPT-4o

Today, the Alibaba Cloud Tongyi team officially released the new mathematical reasoning process reward model Qwen2.5-Math-PRM. This model offers two sizes, 72B and 7B, with performance significantly outperforming similar open-source process reward models, especially excelling in identifying reasoning errors. The 7B version of Qwen2.5-Math-PRM astonishingly surpasses the widely popular GPT-4o, marking an important step in Alibaba Cloud's research and development of reasoning models.

Alibaba Cloud Launches New Mathematical Reasoning Model Qwen2.5-Math-PRM, 7B Version Surpasses GPT-4o

2025-01-16 10:42:26.AIbase

Alibaba Qwen Team Releases New Process Reward Model, Advancing Mathematical Reasoning

The Alibaba Qwen team recently published a paper titled 'Lessons Learned from the Development of Process Reward Models in Mathematical Reasoning' and introduced two new models in the Qwen2.5-Math-PRM series, featuring 7B and 72B parameters respectively. These models break through the limitations of the existing PRM framework in mathematical reasoning, significantly improving the accuracy and generalization ability of reasoning models through innovative techniques. Mathematical reasoning has long been a major challenge for large language models (LLMs), especially regarding errors in intermediate reasoning steps.

Alibaba Qwen Team Releases New Process Reward Model, Advancing Mathematical Reasoning

2024-12-15 10:23:35.AIbase

Ali Launches New AI Benchmark 'PROCESSBENCH' to Assess Error Recognition Capability in Mathematical Reasoning

Ali Launches New AI Benchmark 'PROCESSBENCH' to Assess Error Recognition Capability in Mathematical Reasoning

2024-11-29 09:47:51.AIbase

Devastating Loss! Epoch AI Launches New Mathematics Benchmark FrontierMath, Top AI Models Solve Less Than 2%

In the vast universe of artificial intelligence, mathematics has long been considered the last bastion of machine intelligence. Now, a new benchmarking test named FrontierMath has emerged, pushing AI's mathematical reasoning capabilities to unprecedented limits. Epoch AI has collaborated with over 60 top minds in mathematics to create this challenge, which can be described as the 'Olympics of Mathematics' for AI. This is not just a technology test; it is the ultimate interrogation of artificial intelligence's mathematical wisdom. Imagine a laboratory filled with the world's top mathematicians, meticulously designing...

Devastating Loss! Epoch AI Launches New Mathematics Benchmark FrontierMath, Top AI Models Solve Less Than 2%

2024-11-18 07:58:19.AIbase

Kimi Launches Mathematical Reasoning Model k0-math: Math Capabilities Benchmarking Against OpenAI's o1 Series

The Dark Side of the Moon Kimi Smart Assistant has announced the launch of its next-generation mathematical reasoning model, k0-math. The k0-math model has shown outstanding performance in multiple mathematical benchmark tests, surpassing OpenAI's o1 series models, o1-mini and o1-preview, in four mathematical benchmark assessments including middle school entrance exams, college entrance exams, graduate school entrance exams, and MATH competitions that include introductory problems.

Kimi Launches Mathematical Reasoning Model k0-math: Math Capabilities Benchmarking Against OpenAI's o1 Series

2024-10-14 14:51:30.AIbase

Apple Research Team Releases New Benchmark GSM-Symbolic: Revealing the Mathematical Reasoning Limitations of Large Language Models!

Apple Research Team Releases New Benchmark GSM-Symbolic: Revealing the Mathematical Reasoning Limitations of Large Language Models!

2024-10-12 14:59:01.AIbase

Apple AI Research Team Discovers Limitations of Large Model Inference, Rendering OpenAI's o1 Ineffective with Just One Sentence

In the world of artificial intelligence, the reasoning capabilities of machine learning models, particularly large language models (LLMs), have been a focal point for scientists. Recently, Apple's AI research team published a paper titled 'Understanding the Limitations of Large Language Models in Mathematical Reasoning,' shedding light on these models' limitations when addressing logical problems. In the paper, researchers demonstrate this through a simple mathematical question. They first posed a question about Oliver picking kiwis: as follows: Oliver picked 44 on Friday.

Apple AI Research Team Discovers Limitations of Large Model Inference, Rendering OpenAI's o1 Ineffective with Just One Sentence