2024-12-15 10:23:35.AIbase.
Ali Launches New AI Benchmark 'PROCESSBENCH' to Assess Error Recognition Capability in Mathematical Reasoning
2024-11-29 09:47:51.AIbase.
Devastating Loss! Epoch AI Launches New Mathematics Benchmark FrontierMath, Top AI Models Solve Less Than 2%
2024-11-18 07:58:19.AIbase.
Kimi Launches Mathematical Reasoning Model k0-math: Math Capabilities Benchmarking Against OpenAI's o1 Series
2024-10-14 14:51:30.AIbase.
Apple Research Team Releases New Benchmark GSM-Symbolic: Revealing the Mathematical Reasoning Limitations of Large Language Models!
2024-10-12 14:59:01.AIbase.