Recently, Google's Gemini artificial intelligence project has been enhancing its performance by comparing its output with that of Anthropic's Claude model. Internal communications obtained by TechCrunch indicate that contractors responsible for improving Gemini are systematically evaluating the responses of both AI models.

Code Internet Computer

Image Source Note: Image generated by AI, image licensed by Midjourney

In the AI industry, model performance evaluation is typically conducted through industry benchmark tests rather than having contractors compare the answers of different models one by one. The contractors responsible for Gemini need to score the model outputs based on multiple criteria, including accuracy and detail. They have up to 30 minutes each time to determine which response is better, Gemini or Claude.

Recently, these contractors noticed that Claude's references frequently appeared on the internal platform they were using. Some content presented to the contractors explicitly stated: "I am Claude, created by Anthropic." In an internal chat, contractors also found that Claude's responses were more prominent in emphasizing safety. Some contractors pointed out that Claude's safety settings are the strictest among all AI models. In certain cases, Claude chooses not to respond to prompts it deems unsafe, such as role-playing other AI assistants. In another instance, Claude avoided a prompt, while Gemini's response was flagged as a "major safety violation" for containing "nudity and bondage" content.

It is important to note that Anthropic's commercial service terms prohibit customers from using Claude to "build competitive products or services" or "train competing AI models" without authorization. Google is one of Anthropic's major investors.

A spokesperson for Google DeepMind, Shira McNamara, did not disclose whether Google obtained Anthropic's approval to use Claude during an interview with TechCrunch. McNamara stated that DeepMind does compare model outputs for evaluation but has not trained Gemini using the Claude model. She mentioned, "Of course, as per industry standard practices, we do compare model outputs in certain cases. However, any claims about us training Gemini using Anthropic models are inaccurate."

Last week, TechCrunch also exclusively reported that Google's contractors were asked to score Gemini's AI responses in areas outside their expertise. Some contractors expressed concerns in internal communications, believing that Gemini might generate inaccurate information on sensitive topics such as healthcare.

Key Points:

🌟 Gemini is conducting comparative tests with Claude to enhance its AI model performance.

🔍 Contractors are responsible for scoring, with comparisons of responses involving multiple criteria, including accuracy and safety.

🚫 Anthropic prohibits the use of Claude for training competitive models without authorization.