2024-12-19 17:47:00.AIbase.
CompassArena Upgrade: Launch of New Judge Copilot Feature
2024-12-19 14:07:19.AIbase.
AI is Not Omnipotent: Latest Research Reveals Top AI Models Exhibit Cognitive Impairments Similar to Early Dementia
2024-12-09 17:08:28.AIbase.
The AI Evaluation Landscape: How Chatbot Arena is Changing the 'Survival Rules' for Tech Companies
2024-12-05 14:45:53.AIbase.
Byte's New Code Model Evaluation Benchmark 'FullStack Bench'
2024-11-06 14:17:46.AIbase.
CMU and Meta Join Forces to Unveil VQAScore! A Single Question Addresses Evaluation of Text-to-Image Models, Achieving Accuracy that Far Surpasses Traditional Methods!
2024-10-15 16:28:44.AIbase.
PDFtoChat Technical Evaluation Report: An AI-Based Intelligent Q&A System for PDF
2024-10-09 15:51:44.AIbase.
AI Video Generation Model Evaluation Report: Minimax Text Control is the Strongest, Ling 1.5 Can Master “Water Pouring”
2024-09-29 15:33:05.AIbase.
Salesforce AI Launches New Large Language Model Evaluation Family SFR-Judge Based on Llama3
2024-09-26 08:25:17.AIbase.
Baidu Wenxin Kuai Ma Tops the Rankings of Two Major Evaluation Reports: Sullivan and SuperCLUE
2024-09-10 11:03:27.AIbase.
AI Evaluation Made Easy! Hugging Face Launches LightEval to Help You Master Model Performance!
2024-09-05 08:43:35.AIbase.
ZhiYuan Research Institute Launches FlagEval Large Model Arena Featuring Text-to-Video Model Combat Evaluation Service
2024-09-03 13:42:26.AIbase.
DingTalk Launches Multiple 'Super Assistants', Including Super Work Order Assistant and Super Evaluation Assistant
2024-08-23 09:05:19.AIbase.
Baidu Smart Cloud Keway Passed the Evaluation of 'Intelligent Customer Service Based on Large Models' by the China Academy of Information and Communications Technology
2024-08-13 08:11:01.AIbase.
The Compass Arena, a Large Model Evaluation Platform, Adds a Multi-Modal Large Model Competition Section
2024-08-07 14:14:43.AIbase.
Meta Launches 'Self-Taught Evaluator': NLP Model Evaluation Without Human Annotation, Outperforming Common LLMs Like GPT-4
2024-07-23 08:09:28.AIbase.
Baidu Smart Cloud Launches Financial AI App 'Zhijin' with Asset Evaluation Features
2024-07-12 11:10:22.AIbase.
OpenAI Unveils Initial AGI Evaluation Criteria: ChatGPT Rated at Stage One
2024-07-02 09:07:20.AIbase.
Anthropic Launches Initiative to Fund AI Evaluation Benchmark Development
2024-06-27 09:28:40.AIbase.
Hugging Face Updates Leaderboard Evaluation Rules, AI Assessment Enters New Phase
2024-06-24 10:24:59.AIbase.