AI Daily: SD 3 Goes Open Source; ChatTTS Official Website Launches for Chinese Voice AI; Veo Enables Video Generation from Single Image; ElevenLabs Introduces Diverse AI Audio Models

Welcome to the [AI Daily] column! Here is your daily guide to exploring the world of artificial intelligence. Every day, we present the hottest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications.

Fresh AI products click to learn: https://top.aibase.com/

1、TikTok: Crackdown on AI-generated image defamation

TikTok has announced a crackdown on cyberbullying, handling 162 cyberbullying incidents and warning nearly 700,000 perpetrators. A feedback mechanism for cyberbullying has been established, providing a one-click anti-cyberbullying function to protect user safety. Users can apply for legal consultation services to protect their rights.

【AiBase Summary:】
🚫 Crackdown on cyberbullying, handling 162 incidents, warning nearly 700,000 perpetrators
🔒 Establishment of a cyberbullying feedback mechanism, cooperating with the police to combat illegal activities
🛡 Provision of a one-click anti-cyberbullying function to protect user safety, users can apply for legal consultation services to protect their rights

2、Stability AI open-sources SD 3: Available for download on June 12, not for commercial use

I am excited about Stability AI's announcement of the open-source release date for the 2 billion parameter SD3 Medium model on June 12. This model offers photorealism, excellent typography, and high performance, suitable for consumer systems and enterprise workloads. The SD3 Medium is the latest product from Stability AI, expected to bring a more stable and efficient user experience.

【AiBase Summary:】
⭐️ Photorealism: Overcoming common artifacts on hands and faces, providing high-quality images without complex workflows.
⭐️ Excellent typography: Robust results in typography, outperforming larger state-of-the-art models.
⭐️ High performance: Optimized size and efficiency, perfect for consumer systems and enterprise workloads.
Details: https://stability.ai/stablediffusion3

3、NVIDIA releases digital human AI technology NVIDIA ACE to enhance character interaction experience

NVIDIA has recently launched the advanced digital human AI technology called Avatar Cloud Engine (ACE) to enhance the interaction experience of characters in games and virtual worlds. This technology endows NPCs in games with intelligent dialogue capabilities, achieving natural and intelligent communication, enhancing the vividness and realism of characters. ACE technology can be flexibly deployed on the cloud or local devices, ensuring smooth and high-quality interaction experience, while optimizing neural networks to reduce latency and ensure real-time interaction. This technology is expected to bring revolutionary changes in game development and virtual reality fields, expanding to customer service, education, and entertainment fields, providing more natural and intelligent performance.

【AiBase Summary:】
🗨️ Intelligent dialogue capabilities: ACE technology endows game NPCs with real dialogue capabilities, surpassing preset dialogue patterns.
🎤 Voice and facial animation generation: ACE uses AI technology to generate real replies, enhancing the vividness and realism of characters.
🚀 Flexible deployment and low latency: ACE can be deployed on the cloud or local devices, ensuring smooth and high-quality interaction experience, reducing latency impact.

4、Claude 3 now supports function calling tools Tool use

Claude 3 now supports function calling tools Tool use, allowing it to interact with external tools and APIs, providing more dynamic and accurate responses. The development of this technology demonstrates the huge potential of AI in improving work efficiency and innovating service methods.

【AiBase Summary:】
🛠️ Extract structured data from unstructured text, reducing manual input workload.
🔍 Convert natural language requests into structured API calls, simplifying self-service processes.
⏰ Coordinate multiple Claude sub-agents to perform refined tasks, such as automatically coordinating meeting times.
Details: https://docs.anthropic.com/en/docs/tool-use

5、NVIDIA launches AI game assistant G-Assist

G-Assist is NVIDIA's game AI assistant, answering game questions and providing personalized guidance to players through voice queries. It can optimize PC settings, offer game performance suggestions, and even overclock the GPU. Nvidia's G-Assist demonstrates the future possibilities of AI assistants, although caution is still needed.

【AiBase Summary:】
⭐ G-Assist is NVIDIA's game AI assistant, guiding players through games and configuring optimal settings.
⭐ The assistant can answer questions in games through voice queries and provide personalized guidance based on the situation on the screen.
⭐ It can not only optimize and adjust PC settings but also offer game performance suggestions and even overclock the GPU.

6、DeepMind's video generation model Veo supports generating video clips from a single reference image

Google DeepMind's Veo model is an innovative video generation model that can generate video clips from a single reference image and adjust the visual style through text prompts. The model brings new possibilities to the creative industries and video production fields, but also reminds users to pay attention not to get distracted.

【AiBase Summary:】
🔑 Veo model supports generating video clips from a single reference image and adjusting the visual style.
🌟 Applications include experimental tool VideoFX, allowing users to experience part of the Veo model's functions.
💡 Veo model has the potential to generate video clips that meet user requirements based on image content and text prompts.
Details: https://blog.google/technology/ai/google-labs-video-fx-generative-ai/

7、A hit as soon as it's launched! The Chinese voice AI ceiling ChatTTS launches its official website

ChatTTS is a highly-regarded Chinese voice AI project that has caused a sensation soon after its launch. Users can use ChatTTS to convert text to speech, conduct real-time voice conversations, and other functions, with multi-language support and fine-grained control. The project is suitable for various scenarios, including e-commerce live streaming, self-media, online education, and customer service.

【AiBase Summary:】
🔊 Text-to-speech, real-time voice conversation functions
🎤 Multi-language support and mixed Chinese-English performance
👥 Multi-speaker support and large-scale training data application
Details: https://chattts.com/

8、ControlNet author launches new project Omost: Transform a sentence into a composition

Omost is a groundbreaking image generation project that can generate detailed and accurate images through simple prompts, greatly simplifying the process of image description. Users only need to enter simple prompts to obtain high-quality, expected images. At the same time, Omost has the advantages of automatically expanding prompts, high flexibility, and image position encoding, providing powerful tool support for image generation.

【AiBase Summary:】
⭐ Very short prompts can generate very detailed and spatially accurate images
⭐ High flexibility, retaining image layout, one prompt can modify elements
⭐ Provides detailed descriptions, supports complex image generation, applied in AI painting, advertising creativity, etc.
Project page: https://top.aibase.com/tool/omost
Try it: https://huggingface.co/spaces/lllyasviel/Omost

9、ElevenLabs launches innovative AI audio model

ElevenLabs recently launched an innovative AI audio model that can generate various sound effects, short instrument tracks, soundscapes, and character voices through text prompts, bringing great benefits to content creators, video game developers, and film and television studios. This technology greatly simplifies the audio content creation process, improves creative efficiency, and expands creative space.

【AiBase Summary:】
🔊 Text-to-audio conversion: Users input text prompts, AI generates corresponding sound effects and music.
🎶 Diversity: Can generate various sound effects to meet different scene needs.
🎭 Character voice generation: Creates unique voices for different characters in animations, games, or film and television works.
Details: https://top.aibase.com/tool/elevenlabs-text-to-sound-effects

10、PixVerse releases motion brush feature Magic Brush, more convenient and intuitive than Runway

PixVerse's latest motion brush feature Magic Brush greatly enhances the usability and user experience of the product, bringing flexibility and efficiency to animation and dynamic image creation. Users can customize the motion direction and distance of image areas through hand-drawn arrows, achieving more precise dynamic effect control. The operation is simple and intuitive, no complex learning curve is required, which improves the creative expression space and work efficiency.

【AiBase Summary:】
✨ Customize motion direction and distance, precisely control dynamic effects
🎨 Simple and intuitive operation, enhance user-friendliness and creative expression space
⏱️ Simplify the animation production process, improve work efficiency and creative speed
Details: https://top.aibase.com/tool/pixverse

11、Nvidia releases GeForce RTX enhanced version, supporting AI PC digital assistants

Nvidia unveiled new RTX technology at Computex, powering new GeForce RTX AI laptops, and introducing Project G-Assist technology demos to provide context-aware assistance for PC games and applications. Additionally, Nvidia ACE digital character platform debuted, supporting digital characters. These technologies accelerate over 500 PC applications and games, as well as 200+ OEM laptop designs, bringing next-generation AI-powered experiences to over 100 million RTX AI PC users.

【AiBase Summary:】
⭐ Nvidia releases new RTX technology, powering GeForce RTX AI laptops
⭐ Project G-Assist technology demos provide context-aware assistance for PC games and applications
⭐ Nvidia ACE digital character platform debuts, supporting digital characters

12、McKinsey survey shows: Generative AI applications grow fastest in Greater China

Generative AI applications are booming in Greater China and the Asia-Pacific region, with 65% of respondents frequently using generative AI and already starting to generate commercial value. Enterprises mainly apply generative AI in three ways: using off-the-shelf products, cooperating with AI vendors to fine-tune models, or developing products independently. Application scenarios mainly include text, code, audio, video, and image generation capabilities. As multi-modal large models emerge, application scenarios will further expand.

⚙️ Generative AI application growth Greater China and the Asia-Pacific region are the fastest-growing regions, with frequent use by native digital populations being the main reason.
💼 Enterprise application methods: Three ways including off-the-shelf product use, cooperation with AI vendors to fine-tune models, and independent product development.
🔍 Application scenarios expand: Generative AI functions and application scenarios are linked, including text, code, audio, video, and image generation capabilities. As multi-modal large models emerge, application scenarios will further expand.

13、ByteDance's AI assistant Doubao launches PC client and browser plugin version

As ByteDance's AI assistant Doubao launches PC client and browser plugin versions, it provides users with more convenient AI functionality experiences. Users can use Doubao to achieve quick in-text translation, AI search, one-click desktop stay, and other functions. It also supports web and video summary, writing, and text modification functions. Doubao's AI large model series covers a variety of functional models, providing users with comprehensive AI auxiliary services.

【AiBase Summary:】
🔍 Doubao PC client version supports quick in-text translation, AI search, one-click desktop stay, and other functions
📚 Plugin version provides one-click summary of web and video, writing, and text modification functions
💡 Doubao large model series includes Doubao general model Pro, role-playing model, voice synthesis model, etc., providing a variety of AI functions

14、Saudi Aramco invests in Chinese AI startup Zhipu AI

This article reports that Prosperity7, a subsidiary of Saudi Aramco, has invested in Chinese generative artificial intelligence startup Zhipu AI, valuing it at $3 billion. This investment not only provides financial support for Zhipu AI but also helps it expand in the international market. Zhipu AI is developing strongly in the field of artificial intelligence and has attracted attention from international capital.

【AiBase Summary:】
🌐 Zhipu AI receives $400 million investment from Prosperity7, a subsidiary of Saudi Aramco, valuing it at $3 billion.
💡 Zhipu AI is a company spun off from the Department of Computer Science at Tsinghua University, led by Professor Tang Jie, achieving significant success in the field of generative artificial intelligence.
💰 Zhipu AI has previously received financing of over 2.5 billion yuan, involving multiple well-known institutions and companies.