AIbase
Product LibraryTool Navigation

Search AI Products and News

  • AI News
  • AI Tools
2025-04-03 14:00:32.AIbase

Gemini-2.5-pro Demonstrates Superior Mathematical Abilities in MathArena Evaluation, Surpassing Other Models

2025-04-02 14:47:08.AIbase

Arthur Launches First Open-Source Real-time AI Evaluation Engine: Arthur Engine

2025-03-21 11:48:03.AIbase

High School Student Creates AI Model Evaluation Website Using Minecraft

2025-03-21 09:45:00.AIbase

Minecraft Transformed into an AI Arena: High School Student Builds Innovative Model Evaluation Platform

2025-03-12 15:28:43.AIbase

Ant Group's Medical Large Language Model Wins Double Championship in MedBench Evaluation, Ushering in a New Era for Medical AI

2025-01-16 10:42:26.AIbase

Alibaba Qwen Team Releases New Process Reward Model, Advancing Mathematical Reasoning

2025-01-10 15:49:29.AIbase

The Glorious GLM-4-9B Model Achieves Only 1.3% Hallucination Rate, Winning First Place in Global Large Model Evaluation

2025-01-02 14:30:40.AIbase

Microsoft Paper Reveals OpenAI Model Parameters? Medical AI Evaluation Unexpectedly Exposes 4o-mini with Only 8B

2024-12-19 17:47:00.AIbase

CompassArena Upgrade: Launch of New Judge Copilot Feature

2024-12-19 14:07:19.AIbase

AI is Not Omnipotent: Latest Research Reveals Top AI Models Exhibit Cognitive Impairments Similar to Early Dementia

2024-12-09 17:08:28.AIbase

The AI Evaluation Landscape: How Chatbot Arena is Changing the 'Survival Rules' for Tech Companies

2024-12-05 14:45:53.AIbase

Byte's New Code Model Evaluation Benchmark 'FullStack Bench'

2024-11-06 14:17:46.AIbase

CMU and Meta Join Forces to Unveil VQAScore! A Single Question Addresses Evaluation of Text-to-Image Models, Achieving Accuracy that Far Surpasses Traditional Methods!

2024-10-15 16:28:44.AIbase

PDFtoChat Technical Evaluation Report: An AI-Based Intelligent Q&A System for PDF

2024-10-09 15:51:44.AIbase

AI Video Generation Model Evaluation Report: Minimax Text Control is the Strongest, Ling 1.5 Can Master “Water Pouring”

2024-09-29 15:33:05.AIbase

Salesforce AI Launches New Large Language Model Evaluation Family SFR-Judge Based on Llama3

2024-09-26 08:25:17.AIbase

Baidu Wenxin Kuai Ma Tops the Rankings of Two Major Evaluation Reports: Sullivan and SuperCLUE

2024-09-10 11:03:27.AIbase

AI Evaluation Made Easy! Hugging Face Launches LightEval to Help You Master Model Performance!

2024-09-05 08:43:35.AIbase

ZhiYuan Research Institute Launches FlagEval Large Model Arena Featuring Text-to-Video Model Combat Evaluation Service

2024-09-03 13:42:26.AIbase

DingTalk Launches Multiple 'Super Assistants', Including Super Work Order Assistant and Super Evaluation Assistant