Elon Musk's artificial intelligence company xAI released its latest language model Grok3 this Monday, marking a significant advancement for the company in the field of artificial intelligence. According to Musk, the computational power required for the new model is ten times that of its predecessor, utilizing a data center located in Memphis equipped with approximately 200,000 GPUs.

QQ_1739931524842.png

The Grok3 series models have launched several variants, including a streamlined version designed to enhance speed at the expense of some accuracy. Additionally, the new "reasoning" model is specifically designed to tackle mathematical and scientific problems. Users can adjust these features through the "Thinking" and "Brain" settings in the Grok interface. xAI stated that this version is not yet finalized, and the model is still undergoing continuous training, with the team planning improvements in the coming weeks.

According to data from the AI benchmarking platform lmarena.ai, Grok3 scored over 1400 in the chatbot category, becoming a leader across all categories including programming, surpassing models from OpenAI, Anthropic, and Google. However, actual performance may differ from benchmark results. For example, although Claude3.5Sonnet scored lower in coding benchmarks compared to some models, many users still consider it a better choice for programming tasks.

OpenAI founder Andrej Karpathy gained early access to Grok3 and praised the model's logical reasoning capabilities. The "Thinking" feature can successfully handle complex tasks, such as calculating the training flops of GPT-2 or creating hexagonal grids for board games, abilities that were previously limited to OpenAI's high-end model o1-pro. Additionally, this feature improved the accuracy of basic mathematical operations, such as letter counting and comparing decimals.

Regarding the new search functionality, Karpathy noted that the quality of DeepSearch is comparable to the research tools of Perplexity, providing relevant answers on topics such as upcoming Apple products and Palantir stock dynamics. However, he also identified some apparent issues: the model sometimes generates false URLs, makes unsupported claims, and only references posts from X under specific prompts.

It also seems to lack awareness of its own existence, missing xAI's position among major AI laboratories. These limitations prevent DeepSearch from reaching the quality level of OpenAI's "deep research" and it performs poorly on humor and ethical issues.