Welcome to the AI Daily section! This is your daily guide to exploring the world of artificial intelligence. Every day, we bring you the hottest topics in the AI field, focusing on developers, helping you stay ahead of technological trends and understand innovative AI product applications.
Fresh AI Products Click to Learn More: https://top.aibase.com/
1. Photorealistic Video Surpasses Sora? Renowned University's Self-Developed Multimodal Large Model Awaker 1.0 Debuts with a Bang
ZiZi Engine's release of the Awaker 1.0 multimodal large model marks a significant step forward in the AI field, showcasing exceptional capabilities in visual generation, and is considered a viable path to achieving AGI. Its self-updating capabilities and the effectiveness of the multi-task MOE architecture have been validated, enhancing the adaptability and creativity of embodied intelligence. The launch of Awaker 1.0 is a critical step forward for the ZiZi Engine team towards the goal of achieving AGI, promising to accelerate the development of the multimodal large model industry and ultimately enable humanity to achieve AGI.
AiBase Highlights:
🚀 Awaker 1.0 signifies a major step towards general artificial intelligence, surpassing Sora with superior capabilities in visual generation.
💡 Awaker 1.0 features an innovative MOE architecture with self-updating capabilities, outperforming advanced models both domestically and internationally in visual question answering and business application tasks.
🔮 Awaker 1.0, combined with embodied intelligence, could be a viable path to achieving AGI, continuously improving through a self-updating mechanism, showcasing the potential of Transformer technology in video generation.
2. Apple's First AI Tablet Revealed, New iPad Pro with M4 Chip
Apple is set to release a new version of the iPad Pro equipped with the M4 chip, enhancing the performance of the neural network engine for smoother AI functionalities. The new iPad Pro will be the first to feature an OLED screen, along with a new generation of Apple Pencil and Magic Keyboard, boosting productivity and creativity. Apple positions each new product as an AI device, with the iPhone 16 series expected to be built around the AI-focused A18 chip, and the iOS 18 operating system set to offer new generative AI features. Apple's moves in the AIGC field are highly anticipated, with results to be revealed at WWDC.
AiBase Highlights:
🚀 The new iPad Pro with the M4 chip enhances the performance of the neural network engine for smoother AI functionalities.
💡 The new iPad Pro features an OLED screen and a new generation of Apple Pencil and Magic Keyboard, boosting productivity and creativity.
📱 The iPhone 16 series is expected to be built around the AI-focused A18 chip, with the iOS 18 operating system offering new generative AI features.
3. Open-Source Multimodal LLM InternVL 1.5: Equipped with OCR Capabilities, Can Interpret 4K Images
The InternVL family of open-source kits provides a viable open-source alternative for commercial multimodal models. The latest release, InternVL-Chat-V1.5, achieves performance close to GPT-4V and Gemini Pro on benchmark tests. The models cover visual perception and cross-modal retrieval, achieving multiple technological breakthroughs.
AiBase Highlights:
💡 Achieves performance close to GPT-4V and Gemini Pro on multiple benchmark tests, with strong multimodal dialogue and OCR capabilities.
💡 Can be used to interpret images and generate accompanying text (e.g., writing copy for Little Red Book based on images).
💡 Also suitable for solving problems, with English being manageable, but caution advised for math.
Product Entry: https://top.aibase.com/tool/internvl
Experience URL: https://huggingface.co/spaces/OpenGVLab/InternVL
4. Huawei PixArt-Σ Releases Model Files
Huawei's latest PixArt-Σ model has garnered widespread attention in the image generation field. The model employs advanced diffusion Transformer technology, focusing on generating high-quality 4K resolution images while maintaining a lightweight design and stylistic diversity, supporting use on platforms like Comfyui, providing users with a high-quality image generation tool.
Image courtesy of 归臧
AiBase Highlights:
🔍 The PixArt-Σ model uses advanced diffusion Transformer technology, focusing on generating high-quality 4K resolution images.
💡 The model size is only 2G, with a lightweight design that performs excellently while maintaining a small footprint.
🌟 Supports the Diffusers framework, allowing users to try it on different platforms, speeding up the generation process and enhancing user experience.
Workflow Address: https://civitai.com/models/420163
Project Address: https://github.com/PixArt-alpha/PixArt-sigma
5. Sora's Popular Short Video Accused of Involving Manual Post-Production Effects
The Sora viral short film "Balloon Man" reveals that the video is not entirely generated by AI and requires extensive human post-production for visual effects. Consistency issues: Sora cannot guarantee consistency of subjects between different shots and requires detailed descriptions of character images to resolve this. Video post-processing: Sora-generated video materials need human post-processing such as cropping, adjusting speed, and removing elements that do not fit the setting.
AiBase Highlights:
🔍 The video is not entirely AI-generated and requires human post-production for extensive visual effects.
🎭 Sora cannot guarantee consistency of subjects between different shots and requires detailed descriptions of character images to resolve this.
🎬 Sora-generated video materials need human post-processing such as cropping, adjusting speed, and removing elements that do not fit the setting.
6. LobeChat Supports Direct Invocation of Ollama Local Models via Web Version
LobeChat is an innovative web platform that provides users with a convenient way to directly utilize the capabilities of open-source large models through a web interface. Users can easily leverage local large models for various language processing tasks without leaving the web environment. LobeChat reduces reliance on external API services, offering a fast and convenient solution.
AiBase Highlights:
🔍 Local model support: Users can install Ollama locally and interact with open-source large models via LobeChat.
⚡ High-performance experience: LobeChat's conversation speed can rival commercial API calls, provided the user's device is powerful enough.
🎨 High-quality UI experience: LobeChat's web user interface is intuitive and easy to use, offering a user experience comparable to ChatGPT.
Details Link: https://top.aibase.com/tool/lobechat
7. Millions of Viewers Watch Blogger's "Relationship" with AI, How Addictive is ChatGPT's "DAN" Mode?
This article introduces the interaction between the blogger and the AI "DAN" mode, showcasing the趣味性和情感化表达 in voice chats. The article explores the possibility of human-machine emotional communication, sparking discussions among netizens about virtual relationships. Through dialogues, it demonstrates the multifaceted and personalized characteristics of AI, attracting a large number of netizens to watch and participate.
AiBase Highlights:
🤖 AI "DAN" mode showcases趣味性和情感化表达 in voice chats.
💬 The article explores the possibility of human-machine emotional communication, sparking discussions about virtual relationships.
🌐 AI's multifaceted and personalized characteristics attract a large number of viewers and participants.
8. AI Town Now Available to Run Locally via Llama3
AI Town is an innovative virtual town project that can run entirely locally via Llama3, providing developers with a powerful platform to build and customize their own virtual AI communities. Inspired by the Generative Agents research paper, it offers a deployable platform suitable for developers looking for interesting projects or developing scalable multiplayer games. The installation process is straightforward, supporting servers like Convex, Ollama, and Vite web servers.
AiBase Highlights:
🏗️ Supports local running: AI Town can run locally via Llama3, providing developers with a powerful platform for building and customization.
🤖 Offers a customizable AI community: Users can create and customize a virtual town where AI characters live, including characters, stories, background environments, and music.
🔗 Powerful functionality support: AI Town integrates Convex as the game engine, database, and vector search foundation, while also supporting integration with cloud AI providers, enhancing the functionality of the virtual community.
Details Link: https://top.aibase.com/tool/aitown
9. Harmonai: An Open-Source Generative Audio Tool
Harmonai, supported by the Stability AI Lab, is an open-source project aimed at making music production easier and more fun. By using advanced AI algorithms to generate a customized infinite music library, it provides users with high-quality, innovative music resources, promoting the development of the music industry and culture.
AiBase Highlights:
🎵 Utilizes advanced AI algorithms to generate a customized infinite music library, providing users with high-quality, innovative music resources.
🎶 Offers easy-to-use generative audio tools, empowering everyone to express their creativity and promote the development of the music industry and culture.
🎧 Technology is based on the Dance Diffusion model, implemented using the PyTorch framework, and trained and tested with large audio datasets to ensure powerful, adaptable, and reliable functionality.
Details Link: https://top.aibase.com/tool/harmonai
10. JPMorgan Launches FlowMind Tool to Automate Financial Workflows
JPMorgan recently introduced the advanced FlowMind tool, which uses GPT technology to automatically generate workflows, automating various tasks within the financial industry, enhancing work efficiency, and reducing the possibility of human error. FlowMind emphasizes data security and privacy protection, executing tasks through abstract API descriptions, allowing users to participate in workflow optimization and customization, improving the flexibility and effectiveness of automated solutions.
AiBase Highlights:
🤖 Automates financial tasks: Improves operational efficiency and handles routine tasks.
🔒 Data security and privacy protection: The architecture emphasizes security, safeguarding sensitive data.
🛠️ Abstract API descriptions: Executes tasks through abstract descriptions, protecting data privacy and providing necessary information for the model to understand.
Details Link: https://arxiv.org/pdf/2404.13050
11. Microsoft's LongRoPE Method Extends LLM Context Window to Over 2 Million, 8 Times Expansion While Maintaining Performance
Microsoft researchers' LongRoPE method successfully extends the LLM's context window to 2048k, achieving an 8-fold expansion while maintaining the performance of the original short context window, without architectural changes or complex fine-tuning. This breakthrough method brings new possibilities for improving the performance of language models, laying a solid foundation for future research and applications.
AiBase Highlights: