AI Daily: Google Gemini Unveils Five New Features; Baidu Launches AI Digital Human Social App WenXiaoYan; OpenAI's Strawberry Project Unveiled; Amazon Introduces Rufus AI Shopping Assistant

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present hot content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications.

Fresh AI products click to learn: https://top.aibase.com/

1. Google Gemini is about to release five new features: Imagen3, custom GPT, etc.

Google is set to launch new features for the Gemini product series, including Imagen3, Gemini custom GPT, and more, which are highly anticipated. These new features will bring users a more personalized and convenient experience, showcasing Google's continuous innovation and development in the field of artificial intelligence.

【AiBase Summary:】

🔍 Google Gemini is about to release new features, including Imagen3, Gemini custom GPT, etc., offering users a more personalized and convenient experience.

🔍 It is expected that Gemini will also introduce features like personalized responses, scheduled prompts, voice recording, and integration with Google Photos, further enriching the user experience.

🔍 Google is actively recruiting Beta testers for the iOS version of Gemini, suggesting that an iOS update may be imminent, demonstrating the momentum of Gemini's continuous development.

2. Baidu launches AI digital human social APP "Wen Xiao Yan"

Baidu has recently launched a social APP called "Wen Xiao Yan," which uses advanced Wenxin large model technology. Users can communicate and interact with simulated digital humans in real-time, establishing emotional connections and providing a more authentic and natural interactive experience. Users can find their favorite digital human chat partners in the app, learn about their information, and interact in various ways.

【AiBase Summary:】

🤖 Users can communicate and interact with AI virtual characters in real-time, establishing emotional connections and enhancing the interactive experience.

📱 Each AI digital human provides unique chat services, which can become users' encyclopedias, life assistants, or even mentors.

💬 Digital humans display voice and text when replying and enhance realism through body language.

3. OpenAI Strawberry Project Revealed: Q* Reasoning Capability Explosion, the Future Within Reach!

I am excited and curious about the OpenAI Strawberry Project. This project, rebranded as Strawberry, is said to enable AI to plan tasks in advance, collect information autonomously online, and even conduct in-depth research. The design concept of the Strawberry model is innovative, giving AI unprecedented reasoning capabilities. The secretive R&D process and high confidentiality of OpenAI make the future results even more anticipated.

【AiBase Summary:】

🍓 The Strawberry Project enables AI to plan tasks in advance, collect information autonomously online, and conduct in-depth research.

🔍 STaR technology allows AI to self-improve by iteratively using a small number of reasoning examples and a large amount of non-reasoning data.

🚀 OpenAI hopes that Strawberry can perform long-term tasks and improve the reasoning capabilities of AI models.

Paper address: https://arxiv.org/pdf/2203.14465

4. Magic Insert: Easily Insert Characters Perfectly into New Backgrounds with One Drag

In the magical world of digital creation, the charm of Magic Insert technology lies in its ability to easily drag the subject from one image into another with a completely different background, achieving perfect integration. This technology combines style-aware personalization and object insertion, demonstrating flexibility and diversity, bringing new challenges to the field of image generation.

【AiBase Summary:】

🔮 Magic Insert technology combines style-aware personalization and object insertion to achieve perfect integration of subjects in different backgrounds.

🌟 Technical highlights include using LoRA and text tokens to fine-tune the model, Bootstrapped Domain Adaptation technology to achieve realistic object insertion, and the flexibility to choose the degree of stylization and fidelity to subject details.

💡 Researchers have demonstrated the effectiveness and user preference of Magic Insert through experiments on various style themes and backgrounds.

Details link: https://magicinsert.github.io/demo.html

5. Kuaikan Manhua: Training a Vertical Large Model in the Field of Second Dimension

Kuaikan Manhua is exploring the use of open-source large models for fine-tuning, training a vertical large model in the field of the second dimension, to enhance the search conversion rate and vitality of works, and promote the innovative development of the comic industry. By applying large language models (LLM) and retrieval-augmented generation technology (RAG), Kuaikan Manhua has built an internal knowledge base, using a fine-tuned large model + RAG enhancement strategy to improve the speed of search response and ranking indicators.

【AiBase Summary:】

🔍 Utilizing large models for fine-tuning to enhance the search conversion rate and vitality of works

🤖 Applying large language models (LLM) and retrieval-augmented generation technology (RAG) to build an internal knowledge base

🎨 Promoting the innovative development of the comic industry, improving user experience and content production capacity

6. Personalized Service Upgrade! Amazon Quietly Launches Rufus AI Shopping Assistant

Amazon's latest Rufus AI shopping assistant brings personalized shopping experiences to users, helping them save time and make wise choices through intelligent Q&A services, demonstrating outstanding problem-solving capabilities in shopping.

【AiBase Summary:】

🛒 Rufus AI shopping assistant launched, providing personalized shopping experiences, saving users time.

🤖 Intelligent Q&A services, providing detailed answers to various product questions from users, including recommendations, comparisons, order tracking.

🌟 Rufus shows potential to become Amazon's ace in the field of intelligent shopping, leading retail innovation.

7. Google Eureka AI Model Exposed Early, Outstanding Text Writing Ability Draws Attention

Google is set to launch a new AI model called "Eureka," which has attracted much attention. Eureka excels in natural language generation and is considered a significant breakthrough in Google's AI field. A preliminary announcement is expected on July 15, with a formal release likely on July 18. In addition to Eureka, Google is also developing other new tools, such as Google Gemini, which has piqued industry interest.

【AiBase Summary:】

✨ Eureka model excels in natural language generation, surpassing other models.

🔑 Eureka demonstrates improved instruction-following capabilities, adhering exceptionally to user-defined parameters.

💡 Eureka has the potential to improve performance across a wide range of AI-driven tasks.

8. 3D Vision Reconstruction Technology DUSt3R: Easily Generate 3D Models Based on 2D Images

DUSt3R is an innovative technology that can create 3D models without camera information, greatly simplifying the conversion process from 2D images to 3D models. It adopts an intelligent processing method, providing efficient reconstruction task processing, and performs excellently, achieving the best results in various visual tasks.

【AiBase Summary:】

🌟 Innovative technology: DUSt3R can create 3D models without camera information, simplifying the complex camera parameter requirements.

📷 Efficient processing: DUSt3R unifies the processing of multiple image reconstruction tasks, intelligently and efficiently.

🚀 Outstanding performance: DUSt3R performs excellently in various visual tasks, achieving the best results.

Details link: https://top.aibase.com/tool/dust3r

9. OpenDiLoCo: An Open-Source Solution for Distributed AI Training with Low Communication Costs and Global Coverage!

In the era of the AI big bang, the OpenDiLoCo open-source framework has implemented the DiLoCo training method, achieving global distributed training with low communication costs while maintaining high computational utilization.

【AiBase Summary:】

🌐 Global distributed training: OpenDiLoCo achieves model training worldwide, spanning two continents and three countries, while maintaining high computational utilization.

⚙️ Dynamic resource management: The training process can dynamically adjust computing resources, with new devices able to join or exit the training at any time.

🔗 Fault tolerance and peer-to-peer communication: Using the Hivemind library to achieve fault-tolerant training, training is conducted through a peer-to-peer communication method, improving efficiency and stability.

Details link: https://arxiv.org/pdf/2407.07852

10. Microsoft MIT Pioneers a New Era of Reasoning: A 67 Million Parameter Model Competes with GPT-4

In this paper, researchers introduce a groundbreaking machine learning training strategy that successfully trains a small Transformer model to compete with GPT-4 by improving logical reasoning capabilities and utilizing causal relationships to construct training sets. This research opens up new possibilities for AI to learn causal reasoning, enabling AI to better understand and explain the world.

【AiBase Summary:】

🔍 Unique training method: Adopts a novel training method to enhance the logical reasoning capabilities of large models.

🧠 Improved logical reasoning: Significantly enhances the model's logical reasoning capabilities, overcoming previous challenges.

🔗 Utilizing causal relationships to construct training sets: Uses a causal relationship model to construct the training dataset, helping the model understand the causal logic behind the data.

Details link: https://arxiv.org/pdf/2407.07612v1

11. US Financial Regulators Urged to Investigate OpenAI's Confidentiality Agreement Issues

This article reports that a group of whistleblowers have exposed issues with OpenAI's confidentiality agreements, calling for an investigation by US financial regulators. The whistleblowers claim that OpenAI may be restricting employees' rights to blow the whistle, raising public concerns. Grassley stated that OpenAI's policies limit the rights of whistleblowers and called for the SEC to investigate its misconduct.

【AiBase Summary:】

⭐️ Whistleblowers expose issues with OpenAI's confidentiality agreements, requesting an SEC investigation

⭐️ OpenAI is accused of violating SEC regulations and depriving employees of their whistleblowing rights

⭐️ According to the whistleblower letter, OpenAI is required to produce all confidentiality agreements to avoid infringing on employees' rights, with whistleblowers requesting the SEC to investigate OpenAI's misconduct