In a research paper focused on evaluating medical AI, Microsoft seems to have once again "unintentionally" disclosed the parameter sizes of several top large language models in the industry. This paper, published on December 26, not only revealed the model parameter data from multiple companies including OpenAI and Anthropic, but also sparked discussions within the industry regarding model architecture and technical capabilities.
According to the paper, OpenAI's o1-preview model has approximately 300 billion parameters, GPT-4o has around 200 billion, while GPT-4o-mini has only 8 billion parameters. This sharply contrasts with Nvidia's earlier claim this year that GPT-4 employs a 1.76 trillion MoE architecture. Additionally, the paper disclosed that Claude3.5Sonnet has a parameter size of about 175 billion.
This is not the first time Microsoft has "leaked" model parameter information in a paper. Last October, Microsoft revealed the 20 billion parameter size of GPT-3.5-Turbo in a paper, only to remove this information in a later update. This recurring "leak" has led industry insiders to speculate whether there is a specific intention behind it.
It is worth noting that the primary aim of this paper is to introduce a medical domain benchmark test called MEDEC. The research team analyzed 488 clinical notes from three hospitals in the United States to evaluate the ability of various models to identify and correct errors in medical documents. The test results showed that Claude3.5Sonnet led other models in error detection with a score of 70.16.
The authenticity of these data has sparked heated discussions in the industry. Some believe that if Claude3.5Sonnet indeed achieves excellent performance with fewer parameters, it would highlight Anthropic's technical strength. Other analysts have inferred that certain parameter estimates are reasonable based on model pricing.
Particularly noteworthy is the fact that the paper estimates parameters for mainstream models but notably omits specific parameters for Google's Gemini. Some analysts suggest that this may be related to Gemini using TPUs instead of Nvidia GPUs, making it difficult to accurately estimate through token generation speed.
As OpenAI gradually downplays its commitment to open-source, core information such as model parameters may continue to be a focal point of industry attention. This unexpected leak has once again prompted deep reflections on AI model architecture, technical pathways, and commercial competition.
References:
https://arxiv.org/pdf/2412.19260
https://x.com/Yuchenj_UW/status/1874507299303379428
https://www.reddit.com/r/LocalLLaMA/comments/1f1vpyt/why_gpt_4o_mini_is_probably_around_8b_active/