AI Daily: OpenAI Launches gpt-image-1 Image Generation API; Nano AI Releases MCP Universal Toolbox; China Accounts for 60% of Global AI Patents

Welcome to the AI Daily column! Your daily guide to exploring the world of artificial intelligence. We bring you the hottest AI news every day, focusing on developers and helping you understand technology trends and innovative AI product applications.

Discover the latest AI products Learn More:https://top.aibase.com/

1. OpenAI Launches New ChatGPT Image Generation API: Developers Can Easily Integrate AI Drawing Capabilities

OpenAI recently launched the gpt-image-1 image generation API, allowing developers to easily integrate this advanced technology into various applications. Since its launch, this feature has attracted a large number of users, generating over 700 million images. gpt-image-1 not only supports various image styles but also has built-in safety guardrails to ensure generated content complies with company policies. Furthermore, its reasonable pricing structure allows developers to generate high-quality images at low cost, marking a significant advancement in the AI image generation field.

【AiBase Summary:】
🌟 OpenAI launches the gpt-image-1 image generation API, easily integrable into applications.
🖼️ Users generated over 700 million images in the first week, attracting millions of new users.
💰 gpt-image-1 image generation costs are reasonable, as low as 2 cents per image.

2. Google Gemini Surpasses 350 Million Monthly Active Users, Still Trails ChatGPT

Google's AI chatbot, Gemini, has seen remarkable user growth over the past year, reaching 350 million monthly active users and increasing its daily active users from 9 million to 35 million. However, it still lags behind market leader ChatGPT, which boasts 600 million monthly active users. Google's partnerships with Samsung and product integrations have fueled Gemini's rapid growth, demonstrating the rising demand for AI chat tools. How Google further enhances Gemini's user experience and functionality will be key to closing the gap with its competitors.

【AiBase Summary:】
🌟 Gemini now has 350 million monthly active users and 35 million daily active users.
🤖 ChatGPT maintains a lead with 600 million monthly active users.
📈 Google's partnerships with Samsung and product integrations have driven Gemini's rapid growth.

3. OpenAI Predicts Revenue Surge to $125 Billion by 2029

OpenAI's recent revenue projections indicate that its total revenue is expected to reach $125 billion by 2029, with AI agent services and channel revenue as the primary drivers. In 2023, OpenAI generated $3.7 billion in revenue and had over 500 million weekly active users, showcasing significant growth. The company anticipates achieving positive cash flow within the next four years, with gross profit margins increasing to nearly 70%. These projections have attracted investor attention and are expected to fuel OpenAI's rapid development.

【AiBase Summary:】
🌟 OpenAI's revenue is projected to reach $125 billion by 2029, with AI agent services as a major growth driver.
📈 2023 revenue reached $3.7 billion, with over 500 million weekly active users, demonstrating significant growth.
💰 Positive cash flow is expected within the next four years, with gross profit margins rising to nearly 70%.

4. Ostris Releases Flex.2-preview, an 8B Parameter Diffusion Model Revolutionizing ComfyUI Workflow

The Ostris team has launched Flex.2-preview, an 800-million-parameter text-to-image diffusion model designed to optimize ComfyUI workflows. This model excels in image generation control, supporting features such as image inpainting and depth control. Open-sourced on Hugging Face, it has quickly garnered attention from the AI art creation community. Flex.2-preview's lightweight design and efficient inference capabilities make it an ideal tool for creative design and commercial applications, showcasing the limitless possibilities of future AI art creation.

【AiBase Summary:】
🎨 Versatile Control Support: Built-in line, pose, and depth control precisely guides generation results, suitable for various creative needs.
🖼️ Image Inpainting Capabilities: Supports advanced image inpainting, allowing users to replace or repair content via masks, enhancing creative flexibility.
⚙️ ComfyUI Integration: The model is optimized for ComfyUI, providing node-based workflow support and simplifying complex task configuration.
Details:https://huggingface.co/ostris/Flex.2-preview

5. NVIDIA Introduces Multimodal LLM Describe Anything: Generates Detailed Descriptions of Specific Regions

NVIDIA's AI team's Describe Anything 3B (DAM-3B) model has garnered significant attention in the multimodal learning field. This model can generate detailed descriptions based on user-specified image or video regions, surpassing the limitations of traditional image annotation. By open-sourcing the code and dataset, DAM-3B provides developers with rich resources, promoting research and application of multimodal AI, particularly showing great promise in education, healthcare, and content creation.

【AiBase Summary:】
🖌️ DAM-3B features region-specific description capabilities, generating detailed descriptions based on user-specified regions, improving description accuracy and richness.
🔓 NVIDIA has open-sourced DAM-3B's code, model weights, and dataset, promoting transparency and community collaboration in multimodal AI research.
🌐 The model shows broad application prospects in content creation, intelligent interaction, and assistive technologies, promoting social inclusion.
Details:https://github.com/NVlabs/describe-anything

6. Nano AI Releases MCP Universal Toolbox, Simplifying AI Tool Integration and Invocation

Nano AI's MCP Universal Toolbox aims to simplify the complex configuration of the Model Context Protocol, providing a one-stop solution. This toolbox pre-configures over 100 MCP services and 18 commonly used API keys, supporting various functions such as image, audio, and video generation. Its release has garnered widespread attention from the AI developer community, with positive feedback highlighting its efficiency and ease of use in significantly improving developer productivity.

【AiBase Summary:】
🔧 Pre-configures over 100 MCP services, allowing developers to call them directly without manual configuration, lowering the entry barrier.
🔑 Includes 18 commonly used API keys, saving users the trouble of obtaining keys themselves, simplifying initial configuration.
🌐 Supports multimodal generation, generating images, audio, and video through natural language instructions, improving creative efficiency.
Details:https://bot.n.cn/download?src=AIBotCode

7. Tencent Cloud's CodeBuddy Launches Craft Software Development Agent

On April 24th, Tencent Cloud released an upgraded version of its code assistant, CodeBuddy, introducing the Craft software development agent. This tool elevates AI programming from simple code completion to project delivery, significantly improving development efficiency. Developers only need to input their requirements in natural language, and Craft can automatically generate complete project code and support mainstream IDEs. Craft also supports the MCP protocol, enabling seamless integration of code testing, building, and deployment, and is compatible with the Tencent ecosystem, facilitating efficient team collaboration.

【AiBase Summary:】
🚀 The Craft agent can translate developers' natural language requirements into complete project code, greatly simplifying the development process.
🔗 Supports the MCP protocol, allowing AI-generated code to seamlessly integrate into testing and deployment stages, improving development continuity.
🧩 CodeBuddy is widely used within Tencent, with 85% of developers using the tool, significantly improving overall development efficiency.
Details:https://cnb.cool

8. Kunlun Wanwei Open-Sources Skywork-R1V2.0 Version

On April 24th, Kunlun Wanwei released its multimodal reasoning model, Skywork-R1V2.0, significantly improving visual and text reasoning capabilities, particularly excelling in challenging science problems and general tasks. This model has set new open-source SOTA records in several authoritative benchmark tests, demonstrating capabilities comparable to commercial closed-source models. The open-sourcing of R1V2.0 not only reflects Kunlun Wanwei's technological strength in the multimodal field but also provides a powerful tool for global developers and researchers, promoting the development of the multimodal ecosystem.

【AiBase Summary:】
🔍 R1V2.0 excels in reasoning Chinese science questions, serving as a free AI problem-solving assistant and setting multiple open-source SOTA records.
⚙️ Uses the multimodal reward model Skywork-VL Reward and a mixed preference optimization mechanism to improve the model's adaptability across multiple tasks and domains.
🌍 Kunlun Wanwei is committed to promoting open source and innovation. R1V2.0's open-sourcing provides a new base model for AGI development, and they will continue to launch leading large models and datasets in the future.
Details:https://github.com/SkyworkAI/Skywork-R1V

9. Zhipu Announces Price Reductions for Several Large Model Products, with GLM-4-Plus Reduced by 90%

On April 24th, the Zhipu BigModel open platform announced significant price adjustments for several of its large model products, entering the "billion-token era," enabling companies to access advanced AI technology at low cost. This adjustment includes multiple products such as GLM-4-FlashX, GLM-Z1 series, and GLM-4-Plus, with GLM-4-Plus experiencing a price reduction of up to 90%. This move aims to lower the barrier to entry, meet the needs of various industries such as finance, internet, and education, and promote widespread application of large model technology.

【AiBase Summary:】
🚀 The GLM-4-FlashX model costs only 10 yuan per 100 million tokens, with inference speed comparable to GPT-4, demonstrating excellent performance.
💡 GLM-Z1-AirX's inference speed is 8 times that of DeepSeek-R1, offering high cost-effectiveness. GLM-Z1-Air is priced at only 1/30th of DeepSeek-R1.
📉 GLM-4-Plus is priced at 5 yuan per million tokens, leading the industry and meeting the needs of various industry scenarios.

10. JSON Visuals for ChatGPT Released, Unlocking Limitless Image Style Creation

The release of JSON Visuals for ChatGPT brings a new creative dimension to image generation. Users can utilize over 50 aesthetic codes and randomizers to easily generate personalized visual content. This tool not only enhances generation flexibility but also supports high-resolution output, suitable for digital art, brand marketing, game design, and more. Community feedback is positive, with anticipation for future feature optimizations and expansions.

【AiBase Summary:】
✨ 50+ aesthetic codes support diverse style generation, meeting creative needs.
🔄 Attribute randomizers automatically adjust style attributes, exploring limitless creative possibilities.
🚀 High compatibility, quickly generating high-resolution images, enhancing user experience.
Details:https://json.visuals.zip/

11. State Intellectual Property Office: China Becomes the World's Largest Holder of AI Patents, Accounting for 60%

At a press conference held by the State Council Information Office, Shen Changyu, director of the State Intellectual Property Office, announced that China has demonstrated strong development momentum in the field of artificial intelligence, becoming the world's largest holder of AI patents, accounting for 60% of the global total. This achievement reflects not only breakthroughs in technological innovation but also China's leading position in emerging industries. The State Intellectual Property Office actively promotes innovation in relevant intellectual property rights systems to support the development of artificial intelligence technology and is committed to improving intellectual property protection rules and enhancing protection and utilization levels.

【AiBase Summary:】
🌟 China holds 60% of global AI patents, becoming the largest holder.
⚙️ The State Intellectual Property Office has launched multiple policies to support the rapid development and application of AI technology.
📈 2024 intellectual property authorization data is encouraging, with social satisfaction reaching 82.36 points.

12. 199 Yuan! Xiaomi Releases New Smart Speaker: Powered by AI Large Model, Smart Dialogue Upgraded

Xiaomi's new smart speaker has been officially released at a consumer-friendly price of 199 yuan. While the hardware configuration is somewhat simplified compared to the Pro version, the introduction of an AI large model has significantly improved the smart interaction experience. The new speaker supports continuous dialogue and voice command control, allowing users to control smart home devices more efficiently. The addition of a remote car preparation function also provides users with a more convenient travel experience. Overall, this speaker excels in both cost-effectiveness and intelligence, suitable for users seeking a convenient lifestyle.

【AiBase Summary:】
🎨 The appearance design follows the minimalist style of the Pro version, using a refreshing light gray shell and a more compact size.
🔧 Some features have been omitted, such as infrared remote control and Type-C audio connection, but the acoustic configuration remains excellent.
🤖 Powered by an AI large model, it supports continuous dialogue and remote car preparation functions, significantly enhancing the smart interaction experience.