Grok Major Update: Enhanced Visual Capabilities, Multi-lingual Audio Processing, and Real-time Search!

AIbase基地

Published inAI News · 6 min read · Apr 23, 2025

xAI's generative AI chatbot, Grok, has received a landmark update, significantly enhancing its capabilities. New features include visual processing, multi-lingual audio processing, and real-time search in voice mode. This update marks a major breakthrough for Grok in multi-modal AI, offering users a smarter and more convenient interactive experience. Below, AIbase provides a detailed analysis of the update's highlights and significance.

Visual Capabilities Breakthrough

Grok's visual processing capabilities are a core highlight of this update. While xAI announced Grok-1.5Vision (Grok-1.5V) with the ability to process documents, charts, screenshots, and photos in April 2024, this version was never publicly released. Now, Grok's visual functionality is officially online. Users can upload images for Grok to analyze complex visual content, such as interpreting data charts, identifying objects, or converting visual information into executable code. This enhances Grok's practical applications and improves its performance in spatial understanding and visual reasoning tasks, showing a leading advantage in RealWorldQA benchmark tests.

Notably, Grok's visual capabilities combined with its real-time data acquisition further enhance its performance in news analysis and social media content interpretation. For example, users can upload a news image, and Grok can combine it with real-time information from X to provide background analysis and event interpretation.

Multi-lingual Audio Processing: A New Voice Interaction Experience in 145+ Languages

Grok's multi-lingual audio processing is equally impressive. By integrating the "VoiceWave" extension, Grok now supports real-time voice interaction in over 145 languages, including English, Spanish, French, Japanese, Chinese, Turkish, and Hindi, covering major global languages. This feature enables natural and fluent voice conversations, supports speech-to-text, speech replay, and simultaneous text highlighting, greatly improving user experience.

For users needing cross-language communication, Grok's multi-lingual audio processing is a boon. Whether learning a new language, handling multilingual customer service, or creating international content, Grok provides personalized voice responses with native pronunciation and adjustable speed and tone. This functionality is available via a Chrome Web Store extension, allowing users to activate and customize interaction settings with simple voice commands.

Real-time Search in Voice Mode: DeepSearch Enables Instant Information Retrieval

Grok's new real-time search feature in voice mode further solidifies its position as a "truth seeker." Leveraging DeepSearch technology, Grok can instantly retrieve the latest information from the web and X using voice commands, generating accurate and detailed answers. Compared to traditional text input, voice search allows users to quickly access real-time trends, news updates, or insights into hot topics.

For example, when a user asks "What's the latest tech news?", Grok can respond quickly in voice form and cite the latest posts from X and web resources, ensuring timeliness and credibility. Furthermore, DeepSearch's transparent reasoning process allows users to see Grok's logical deduction steps and source documents, further enhancing information credibility.

Technical Support Behind the Features: Colossus Supercomputer and Reinforcement Learning

This update's success is due to xAI's continuous investment in technology. Grok3's training relies on the Colossus supercomputer, equipped with 200,000 NVIDIA H100 GPUs, offering 10 times the computing power of its predecessor. This allows Grok to handle complex tasks faster and more accurately, especially in scenarios requiring multi-modal fusion.

Additionally, Grok3 uses large-scale reinforcement learning (RL) to optimize its reasoning abilities, enabling it to correct errors, explore solutions, and generate answers within seconds to minutes. This "human-like thinking" ability allows Grok to outperform competing models, including GPT-4o, Gemini 1.5, and Claude 3.5 Sonnet, in benchmark tests across mathematics, science, and coding.

Musk's xAI Plans $25 Billion Colossus 2 Supercomputer

Elon Musk's AI company, xAI, has announced an ambitious new project to expand on its existing Colossus supercomputer. xAI reportedly plans to raise up to $25 billion in upcoming funding to support the development of its next-generation supercomputer, Colossus 2. Image caption: Image generated by AI, image licensing provider Midjourney. In a conference call with existing investors, Musk stated that the company will conduct a reasonable valuation.

xAI Releases Grok3Mini: A Cost-Effective AI Model for Developers

xAI recently unveiled its new language model, Grok3Mini, further advancing efficient AI technology. Designed for speed and affordability, Grok3Mini, despite its smaller size, outperforms many more expensive AI models across various domains, particularly excelling in math, coding, and scientific benchmarks. Grok3Mini: The perfect balance of high performance and low cost. Grok3Mini is part of the Grok3 series, which includes six variants, including the standard Grok3.

Ireland Launches Investigation into Musk's X, Focusing on Grok AI Data Usage Compliance

Ireland's Data Protection Commission (DPC) has announced an investigation into Elon Musk's social media platform X (formerly Twitter) over concerns that it may have used publicly available posts from EU users to train its Grok AI chatbot without transparency or legal basis. The investigation centers on XIUC (X Internet Unlimited Company) becoming the official controller of EU user data from April 1, 2023, previously under the name Twi…