AI Daily: Runway Gen 3 Introduces 3D Caption Effects; Google Vids Launches Beta Testing; Baidu Cloud Unveils Baby AI Appearance Prediction; Luma AI Rolls Out First and Last Frame Video Generation

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence, where we present hot topics in the AI field every day, focusing on developers to help you understand technical trends and innovative AI product applications.

Discover Fresh AI Products Click to Learn More: https://top.aibase.com/

1. Explosive! Runway Gen 3 Can Now Generate 3D Massive Subtitle Effects for Movie Openings

Runway's latest Gen-3Alpha video generation model has seen significant improvements in fidelity, consistency, and motion performance. It not only generates stable lighting but also demonstrates strong imagination. In-house testers have shown 3D massive subtitle effects, and Gen3 is soon to be available for everyone. The model offers precise control over motion, strong aesthetics, rich imagination, adherence to physical laws, and fast generation speed.

【AiBase Summary:】

⚙️ Runway Gen3 can mimic Marvel movie opening effects to create 3D massive subtitle effects

💡 Extremely stable lighting effects, maintaining high-quality output even in high-speed motion scenes

🎥 Control modes include Motion Brush, advanced camera control, director mode, fast generation speed, and strong aesthetics

Details link: https://top.aibase.com/tool/gen-3-alpha

2. Baidu Netdisk One Moment App Launches Baby AI Appearance Prediction Feature

The Baidu Netdisk One Moment App has introduced an AI appearance prediction feature, allowing users to predict their baby's future appearance by uploading photos. The training data is tailored to Chinese babies, with simple and fast operation. More practical AI features will be developed in the future, with user feedback being solicited.

【AiBase Summary:】

👶 Predict the baby's future appearance, training data tailored to Chinese baby characteristics, simple and fast operation.

📸 Upload photos to get high-definition predicted images, with various fashionable filter effects to choose from.

🔮 Future development of more practical AI features, such as predicting the appearance of children at different stages of growth, soliciting user opinions.

3. Google Launches AI Video Editing App Google Vids for Testing

Google's latest AI video editing app, Google Vids, has demonstrated powerful features and user-friendly characteristics during testing, providing users with efficient and convenient video editing tools. The app integrates the Gemini large model, helping users easily create video content and lowering the threshold for video production. It is expected to become a powerful assistant for video creators in the future.

【AiBase Summary:】

✨ Google Vids integrates the Gemini large model, helping users create slideshows, write video scripts, and produce storyboard scripts.

🎥 Users can edit videos by adding or removing elements, and finally export them as MP4 files, with the entry located in the Google Docs document processing tool.

🚀 The testing of Google Vids enriches the Google Workspace suite, showcasing strong AI technical capabilities and providing users with convenient video editing tools.

4. Luma AI Introduces New Feature: Video Generation from First and Last Frames and Extension by 5 Seconds

Luma AI has introduced a new feature that makes video creation limitless, although there may be occasional hard cuts, which is the charm of editing, making every second full of surprises and creativity. Future video generation will be based on this extension operation to achieve intelligent and personalized video creation.

【AiBase Summary:】

✨ Magic of video extension: the ability to generate videos from the first and last frames, extending by 5 seconds, making every frame full of infinite possibilities

💡 Innovative video feature "Extend": intelligently analyzes video content, extending the video length while maintaining the original style and object consistency

🌟 Dream Machine model release: supports text and image input, generating high-quality videos, simulating real physical properties, free to experience

Details link: https://top.aibase.com/tool/dream-machine

5. New Solos AirGo Vision Smart Glasses Released

Solos has released the new Solos AirGo Vision smart glasses at the Hong Kong Smart Glasses Summit, integrating the wearable AI brain of ChatGPT-4o, instantly upgrading users' eyes to superpowers. The smart glasses not only support real-time Q&A and visual prompts but also have multilingual translation functions, modular design, priced at $249.

【AiBase Summary:】

⭐ Integrates multi-modal AI features of ChatGPT-4o, supports real-time Q&A and visual prompts

⭐ Supports real-time translation in more than 10 languages, easily breaking language barriers

⭐ Modular design, can replace traditional frames of different styles, priced at $249

6. SenseTime's Infini-Video AI Video Generation Platform Introduces AI Digital Anchor "AI Bingbing"

SenseTime's AI digital anchor "AI Bingbing" made its debut at the "2024 China・AI Gala", showcasing outstanding multilingual capabilities and realistic visual effects. Digital human technology has huge potential in the media industry, able to bridge the gap between character IPs and audiences.

【AiBase Summary:】

✨ SenseTime's creation of AI digital anchor "AI Bingbing" showcased professional and natural performance on stage, thanks to SenseNova large model technology.

🌐 Infini-Video platform's AI video generation technology achieves high-definition replication and natural expressions, mouth shapes, and action effects, demonstrating excellent language capabilities.

🔥 SenseTime's Infini-Video provides strong core technology support, enabling AI Bingbing to showcase realistic visual effects and fluent multilingual expression capabilities.

7. Zhihu Announces the Launch of an Independent AI Search Platform "Zhihu ZhiDa"

Zhihu recently launched a new AI product "Zhihu ZhiDa", aiming to improve the efficiency and quality of Q&A, shortening the distance between users and high-quality answers, and enhancing the circulation of content created by community creators. This move heralds a new era of smarter and more personalized Q&A communities.

QQ Screenshot 20240701090057.png

【AiBase Summary:】

🚀 Zhihu ZhiDa is an AI product developed based on Zhihu's rich Q&A data, providing concise and in-depth answer generation methods, supporting users to quickly find the content they need or experts.

💡 The product is positioned as a productivity tool and a connector to discover the world, helping users explore the world through questioning.

🔮 Future plans to launch an App version, introduce multi-modal capabilities, deeply integrate with the Zhihu community, explore external cooperation, and bring new development directions to the entire Q&A field.

Details link: https://zhida.zhihu.com/

8. WhatsApp's Latest Android Beta 2.24.14.7 Introduces Feature to Choose Meta AI Llama Model

WhatsApp's latest beta version has introduced the feature to choose the Meta AI Llama model, allowing users to customize their AI interaction experience according to their needs, experiencing faster and simpler responses or handling more complex queries. This feature demonstrates WhatsApp's continuous innovation in the AI field.

【AiBase Summary:】

🔍 WhatsApp's latest Android beta 2.24.14.7 introduces the feature to choose the Meta AI Llama model through the Google Play Beta program.

🧠 Users can choose the default Llama3-70B model for faster and simpler responses, or choose the advanced Llama3-405B model for more complex queries.

📈 WhatsApp plans to provide a preview version of the more advanced Llama3-405B model, with a weekly usage limit. After reaching the limit, users will return to the default model to continue the conversation.

9. Apple May Announce a Deal with Google Gemini in the Fall

Apple plans to launch an integration deal with Google Gemini and a beta version of Apple Intelligence in the fall, treating artificial intelligence as a direct profit path. Third-party AI services may become a transitional choice for Apple, while Apple will gradually roll out its own generative AI system.

【AiBase Summary:】

🍎 Apple plans to integrate Google Gemini into its devices, launching a beta version of Apple Intelligence.

💡 Apple treats artificial intelligence as a direct profit path, not just a feature to drive hardware sales.

🤖 Third-party AI services may become a transitional choice for Apple, while Apple will gradually roll out its own generative AI system.

10. GPTPdf: Analyzing PDF Files with GPT-4o-like Multimodal LLM

Recently, an open-source project called gptpdf on Github has gained popularity, using a GPT-4o-like VLLM model to parse PDF files and convert them to Markdown format. The project's code is concise and efficient, only 293 lines, yet it can perfectly parse various content including layout, mathematical formulas, tables, images, charts, etc. The cost per page averages $0.013.

【AiBase Summary:】

🔍 Uses a GPT-4o-like multimodal model to parse PDF files, converting them to Markdown format.

💻 Code is concise and efficient, only 293 lines.

🌟 Parsing results are almost perfect, including layout, mathematical formulas, tables, images, charts, etc.

Details link: https://top.aibase.com/tool/gptpdf

11. AI Audio Wizard Resona V2A Automatically Dubs Videos

In the era of AI technology, the Resona V2A technology has emerged, like a magical wizard, allowing videos to speak automatically and sing their own melodies. This is not only a technical breakthrough but also a blessing for creators. Resona V2A generates audio with one click, quickly and efficiently, making it a powerful assistant for creators. High cost-effectiveness, reducing costs by 99%, providing high-quality audio solutions.

【AiBase Summary:】

🔮 Videos automatically speak, singing their own melodies, a technical breakthrough and a blessing for creators.

⚙️ One-click audio generation, quickly and efficiently, accelerating the audio generation speed, allowing creators to invest more time and energy into video creative design.

💰 Cost reduction of 99%, high cost-effectiveness audio solutions, meeting the needs of different users.

Details link: https://top.aibase.com/tool/resona-v2a

12. AI Clothing Swap Black Technology MMTryon Virtual Try-On Framework Allows On-Demand Mix and Match with One Click

The MMTryon virtual try-on framework, jointly developed by Sun Yat-sen University and ByteDance's Zhi Chuang Digital Human team, has revolutionized traditional clothing swap methods, achieving one-click generation of model try-on effects with high quality and ease of operation. Its clothing encoder and multimodal multi-reference attention mechanism make clothing swaps more accurate and flexible, breaking the shackles of traditional algorithms and achieving a new SOTA. MMTryon not only supports trying on single pieces of clothing but also supports combined try-ons without segmentation, achieving high-quality virtual try-ons through text commands.

【AiBase Summary:】

👗 One-click generation of model try-on effects, high quality and ease of operation

🔥 Breaking the shackles of traditional algorithms, achieving a new SOTA, supporting combined try-ons

💡 Utilizing clothing encoder and multimodal multi-reference attention mechanism, clothing swaps are more accurate and flexible

Details link: https://arxiv.org/abs/2405.00448