2025-03-07 14:35:00.AIbase.
Mistral AI Unveils Mistral OCR: A Revolutionary Benchmark in Document Understanding
2025-02-27 17:07:26.AIbase.
Kimi k1.6 Model Unveiled: Programming Prowess Surpasses GPT-3, Ushering in a New AI Wave
2025-02-27 10:08:10.AIbase.
Alibaba's Open-Source Video Generation Model Wan 2.1 Tops Benchmarks, Runs Smoothly on 4070
2025-02-24 11:26:35.AIbase.
OpenAI Employee Publicly Questions xAI: Grok 3 Benchmark Results Are Misleading
2025-02-20 10:37:18.AIbase.
OpenAI's Latest Benchmark Test: AI Programming Ability Matches One-Quarter of Humans, Revealing Limitations
2025-02-18 16:55:26.AIbase.
OpenAI Launches SWE-Lancer Benchmark: Evaluating Model Performance on Real-World Freelance Software Engineering Tasks
2025-02-14 09:15:22.AIbase.
Is it Benchmarking Tesla's 'Optimus'? US AI Humanoid Robot Company Apptronik Raises $350 Million
2025-01-20 10:04:01.AIbase.
AI Benchmark Organization Criticized for Not Disclosing OpenAI Funding in a Timely Manner
2025-01-06 09:18:36.AIbase.
ScreenSpot-Pro: A Multimodal LLM Benchmark Tool Designed for High-Resolution Environments!
2024-12-25 09:22:05.AIbase.
Indeed the Strongest! OpenAI's New Model o3 Sets a Record Score in ARC-AGI Benchmark Test
2024-12-20 16:10:44.AIbase.
ZhiYuan and Tencent Launch Long Text Understanding Benchmark Model LongBench v2
2024-12-15 10:23:35.AIbase.
Ali Launches New AI Benchmark 'PROCESSBENCH' to Assess Error Recognition Capability in Mathematical Reasoning
2024-12-10 11:31:07.AIbase.
ARC-AGI benchmark is about to break through, but founder warns of flaws in test design
2024-12-05 14:45:53.AIbase.
Byte's New Code Model Evaluation Benchmark 'FullStack Bench'
2024-11-29 09:47:51.AIbase.
Devastating Loss! Epoch AI Launches New Mathematics Benchmark FrontierMath, Top AI Models Solve Less Than 2%
2024-11-25 15:09:04.AIbase.
Meta Launches New Multi-IF Benchmark to Challenge Multilingual Instruction Following Capabilities
2024-11-18 07:58:19.AIbase.
Kimi Launches Mathematical Reasoning Model k0-math: Math Capabilities Benchmarking Against OpenAI's o1 Series
2024-11-13 14:06:24.AIbase.
Benchmarking Google NotebookLM! The Voice Generation Model PlayDialog: Capable of Generating Dialogue Podcasts and Narration
2024-11-01 10:48:10.AIbase.
Another New Favorite AI Image Generation Model! Recraft v3 Dominates Benchmark Tests, Beating Flux and Ideogram to Rank First
2024-10-31 14:28:43.AIbase.