Within six hours of its release, OpenAI's GPT-4.5 model soared to the top of the AI leaderboard, claiming the number one spot in all-task classification. However, this reign was short-lived, as Elon Musk's xAI Grok-3 model quickly overtook it, snatching the top position.
Voting data revealed that both GPT-4.5 and Grok-3 received over 3000 votes each, resulting in a final score of 1412 to 1411 – a difference of just one point. While GPT-4.5 excelled in most areas, Grok-3 showed a slight advantage in specific tasks like "style-controlled prompts" and "difficult prompts," leading to its victory.
Image Source Note: Image generated by AI, licensed through Midjourney.
The rapid turnaround in just six hours sparked skepticism among users, questioning the legitimacy of such a swift change. Industry insiders explained that the leaderboard has a voting threshold; only models reaching 3000 votes within a specific timeframe qualify. The simultaneous achievement of this threshold by both newly released models was, therefore, a coincidence.
Interestingly, despite initial negative feedback, GPT-4.5 saw a significant rise in user approval for its high emotional intelligence. OpenAI CEO Sam Altman even shared a conversation with GPT-4.5, mentioning it was the first time a user had requested he promise not to take the model offline.
Furthermore, GPT-4.5 demonstrated exceptional performance in a unique competition resembling a "large model werewolf" game. In this game, AI models engaged in debate, strategy, and voting, with the winner decided by a jury of eliminated members. GPT-4.5 showcased superior performance in cooperation, deception, and strategic planning, surpassing human capabilities.
All this highlights the intensifying competition in the AI arena, with models constantly innovating and improving within their respective domains. The question of who will ultimately win this battle of intelligence remains to be seen, and warrants continued observation.