Recently, the technology news site TechCrunch revealed that Google is using Anthropic's Claude AI model to evaluate the performance of its own Gemini AI model, sparking widespread discussions in the industry about compliance.
According to reports, TechCrunch reviewed internal communication documents showing that contractors for the Gemini project are comparing responses from Gemini with those from Claude to assess the accuracy and quality of Gemini's outputs.
During this evaluation process, contractors need to determine within 30 minutes which AI model provided a better answer. The report also mentioned that on Google's internal platform used for AI model comparisons, Claude's responses included explicit references to itself. In terms of safety, Claude appears to prioritize this more than Gemini, sometimes refusing to answer prompts deemed unsafe or providing more cautious responses. For example, one of Gemini's responses was flagged as a "major safety violation" due to its involvement with "nudity and bondage" content.
Anthropic's terms of service explicitly state that Claude should not be used to build competing products or train competing AI models without approval. A spokesperson for Google DeepMind confirmed that they do compare the outputs of different models for evaluation purposes but denied using Anthropic's model to train Gemini. It is noteworthy that Google is also one of Anthropic's major investors.
Shira McNamara, a spokesperson for Google DeepMind, stated, "As per industry standard practices, we sometimes compare model outputs as part of the evaluation process. However, any claims that we are using the Anthropic model to train Gemini are inaccurate."
Key Points:
📜 Google uses Anthropic's Claude AI to evaluate Gemini, potentially violating service terms.
🔐 Claude appears to have stricter safety protocols than Gemini.
💼 Google DeepMind denies using the Anthropic model to train Gemini while confirming the practice of evaluating model outputs.