2025-01-20 14:04:10.AIbase.
MIT and DeepMind Research Reveals Why Visual Language Models Struggle with Negation
2025-01-16 10:42:26.AIbase.
Alibaba Qwen Team Releases New Process Reward Model, Advancing Mathematical Reasoning
2025-01-10 15:49:29.AIbase.
The Glorious GLM-4-9B Model Achieves Only 1.3% Hallucination Rate, Winning First Place in Global Large Model Evaluation
2025-01-09 14:05:31.AIbase.
The Official Version of AIGC Tool 'Jichuang' Launched by Juyuan Engine: Supports Smart Video Creation, Viral Content Generation, and More
2025-01-02 14:30:40.AIbase.
Microsoft Paper Reveals OpenAI Model Parameters? Medical AI Evaluation Unexpectedly Exposes 4o-mini with Only 8B
2024-12-31 10:35:54.AIbase.
Nonprofit Organization Encode Joins Elon Musk in Effort to Prevent OpenAI from Becoming For-Profit
2024-12-25 09:07:40.AIbase.
Google is Using Claude to Evaluate Gemini AI, Raising Compliance Concerns
2024-12-19 17:47:00.AIbase.
CompassArena Upgrade: Launch of New Judge Copilot Feature
2024-12-19 14:07:19.AIbase.
AI is Not Omnipotent: Latest Research Reveals Top AI Models Exhibit Cognitive Impairments Similar to Early Dementia
2024-12-19 09:21:18.AIbase.
Google Gemini is Forcing Contractors to Evaluate AI Responses Outside of Their Expertise
2024-12-12 08:39:55.AIbase.
Tongyi Qianwen Joins ModelScope Community to Open Source P-MMEval Testing Set: Evaluating Multilingual Capabilities of Models
2024-12-11 11:12:39.AIbase.
AI Conversational Products May Face a Ceiling as ByteDance Elevates Dream and Video Editing Priorities
2024-12-09 17:08:28.AIbase.
The AI Evaluation Landscape: How Chatbot Arena is Changing the 'Survival Rules' for Tech Companies
2024-12-05 14:45:53.AIbase.
Byte's New Code Model Evaluation Benchmark 'FullStack Bench'
2024-12-04 13:54:02.AIbase.
"Xuexi Qiangguo" Joins Hands with Baidu AI to Launch Intelligent Document Assistant, Now Available on Wenxiaoyan APP
2024-11-29 09:47:51.AIbase.
Devastating Loss! Epoch AI Launches New Mathematics Benchmark FrontierMath, Top AI Models Solve Less Than 2%
2024-11-06 14:17:46.AIbase.
CMU and Meta Join Forces to Unveil VQAScore! A Single Question Addresses Evaluation of Text-to-Image Models, Achieving Accuracy that Far Surpasses Traditional Methods!
2024-10-31 14:28:43.AIbase.
OpenAI Launches New AI Benchmark SimpleQA: Evaluating the Factual Accuracy of Language Models
2024-10-23 13:37:38.AIbase.
Cook Responds to When Apple's AI Will Launch in China: Working Hard to Complete the Relevant Processes
2024-10-23 09:40:35.AIbase.