Anthropic Launches Initiative to Fund AI Evaluation Benchmark Development

AIbase

Published inAI News · 4 min read · Jul 2, 2024

Anthropic announced on Monday the launch of a new initiative aimed at funding the development of new benchmarks capable of assessing the performance and impact of artificial intelligence models, including its own generative model, Claude.

According to information released in Anthropic's official blog, the company will provide financial support to third-party organizations to develop tools that "effectively measure the advanced capabilities of artificial intelligence models." Interested organizations can submit applications, and evaluations will be conducted on a rolling basis.

Claude3

Anthropic stated that this investment aims to enhance the entire field of artificial intelligence safety, providing valuable tools to the entire ecosystem. The company believes that developing high-quality, safety-related assessments remains challenging and that demand exceeds supply.

This initiative focuses on artificial intelligence safety and social impact, planning to create challenging benchmarks through new tools, infrastructure, and methods. Anthropic specifically requires testing to evaluate the models' abilities in areas such as cyberattacks, weapon improvement, manipulation, or deception. Additionally, the company is committed to developing a "early warning system" for identifying and assessing artificial intelligence risks related to national security and defense.

Anthropic also indicated that the new plan will support research into the potential of artificial intelligence in assisting scientific research, multilingual communication, reducing bias, and self-regulation. To achieve these goals, the company envisions establishing a new platform where experts can develop assessments and conduct large-scale trials.

While Anthropic's move has been praised, it has also sparked some skepticism. Some views suggest that considering the company's commercial interests, the fairness of its funded projects may be compromised. Moreover, some experts express doubts about certain "catastrophic" and "deceptive" artificial intelligence risks mentioned by Anthropic, fearing that this might divert attention from more pressing current issues of artificial intelligence regulation.

Anthropic hopes that this plan will drive comprehensive artificial intelligence assessment to become an industry standard. However, whether independent artificial intelligence benchmark development groups will be willing to collaborate with commercial artificial intelligence suppliers remains to be seen.

AI Tastes and Understands New Breakthrough! It's So Easy to Distinguish Coke from Coffee!

Italian scientists developed GO-ISMD, an artificial taste system with 90% accuracy in identifying basic tastes. Using graphene oxide, it detects flavors via conductivity changes, achieving 92.3% accuracy in distinguishing cola/coffee. Published in PNAS, it could help restore taste for impaired patients.....

Unsloth AI Releases 1.8-bit Quantized Kimi K2 Model, Significantly Reducing Deployment Costs

Unsloth AI quantized Moonshot AI's 1T-parameter Kimi K2 model to 1.8bit, reducing size by 80% to 245GB while maintaining performance. The MoE-based model excels in coding and reasoning, now deployable on 512GB M3Ultra devices, lowering costs. This advancement positions Kimi K2 as a GPT-4.1 competitor, benefiting SMEs and boosting open-source AI adoption in education/healthcare.....

Meta Announces World's First 1GW+ Power Supercomputer Cluster to Go Live, AI Computing Competition Rises to New Level

Meta accelerates AI infrastructure, targeting a 1GW 'Prometheus' supercomputer with 1.3M NVIDIA H100 GPUs (2 exaflops) by 2026, plus 5GW 'Hyperion' cluster. Plans $60-65B investment by 2025 for AI/data centers, competing with OpenAI/xAI. Commits to open-source and privacy despite environmental concerns.....

What is UTCP? A New Tool Calling Protocol: Let AI Agents Directly Access Tools, Reducing Latency

Global developers have introduced a universal tool calling protocol (UTCP), allowing AI agents to directly call various tools without relying on proxy servers. Compared to traditional MCP protocols, UTCP supports native interfaces such as HTTP and gRPC, significantly reducing calling latency and complexity. The protocol retains existing enterprise security measures while providing SDKs in TypeScript and Python. Developers can participate in improving the protocol through open-source projects. UTCP has the potential to open up new pathways for AI tool integration.

Cognition Acquires Windsurf AI Coding Tool, Intensifying the Competition in AI Coding!

A dramatic acquisition has recently taken place in the AI coding field: Cognition acquired Windsurf company. Previously, this company had experienced a $2.4 billion reverse talent acquisition by Google and an unsuccessful $3 billion acquisition offer from OpenAI. Windsurf generates $82 million in annual revenue, has 350 enterprise clients, and tens of thousands of daily active users. After the acquisition, Cognition will integrate Windsurf's AI development environment with its own Devin coding assistant and regain access to the Claude AI model. This deal marks another significant move in the competition.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Anthropic Launches Initiative to Fund AI Evaluation Benchmark Development

AIbase

This article is from AIbase Daily

AI News Recommendations

AI Tastes and Understands New Breakthrough! It's So Easy to Distinguish Coke from Coffee!

AI Daily: Meitu Launches Imaging AI Agent RoboNeo; 1.8bit Quantized Kimi K2 Model Released; Amazon Introduces AI Code Editor Kiro

Grok4 Is Coming! Elon Musk's New AI Star Successfully Challenges Programming Tests

Kimi K2 Sweeps Globally! Open Source AI Tops OpenRouter, Surpassing XAI in Market Share

Claude Major Upgrade! One-Click Link to MCP Tool Directory, AI Workflow Efficiency Soars

Unsloth AI Releases 1.8-bit Quantized Kimi K2 Model, Significantly Reducing Deployment Costs

Meta Announces World's First 1GW+ Power Supercomputer Cluster to Go Live, AI Computing Competition Rises to New Level

UTCP Makes a Strong Entry! Revolutionizing MCP AI Tool Calls into a New Era of Zero Packaging

What is UTCP? A New Tool Calling Protocol: Let AI Agents Directly Access Tools, Reducing Latency

Cognition Acquires Windsurf AI Coding Tool, Intensifying the Competition in AI Coding!