AI Daily: Alibaba Open Sources Document Model DocOwl 1.5; Midjourney Image Editor New Features Launching Next Week; Viggle AI Introduces Lip Sync Feature

Welcome to the AI Daily section! This is your daily guide to exploring the world of artificial intelligence. Each day, we bring you the hottest topics in the AI field, focusing on developers, helping you understand technological trends and discover innovative AI product applications.

Explore fresh AI products by clicking here: https://top.aibase.com/

1. Master tables and charts effortlessly! Alibaba DAMO Academy releases DocOwl 1.5, which understands documents without OCR!

Alibaba DAMO Academy and Renmin University of China have jointly open-sourced the mPLUG-DocOwl1.5 document processing model, which can understand document content without OCR and leads in multiple visual document understanding benchmark tests. The model emphasizes the importance of structural information and proposes "unified structure learning" to enhance MLLM performance.

AiBase Highlights:
🔍 mPLUG-DocOwl1.5 understands document content without OCR and leads in visual document understanding benchmarks.
📊 Emphasizes the importance of structural information for document understanding and proposes "unified structure learning" to enhance MLLM performance.
🔗 Provides open-source code, models, and datasets, achieving state-of-the-art performance in multiple downstream tasks.
Details link: https://github.com/X-PLUG/mPLUG-DocOwl/tree/main/DocOwl1.5

2. New features for Midjourney's image editor coming next week

Midjourney's founder, David Holz, announced that a new image editor is about to launch. This editor uses depth information from uploaded images to generate new pictures, completely changing textures, colors, and details while preserving the original composition and content. This innovation enhances user creative freedom and provides powerful tools for designers and artists. Midjourney continues to optimize image generation quality through AI technology, and the latest v6.1 model further improves image clarity and accuracy. The addition of the new image editor will broaden the application of AI in the creative field, signaling a significant advancement in image editing tools for Midjourney.

AiBase Highlights:
✨ Uses depth information from uploaded images to generate new pictures, preserving the original composition and content while completely changing textures, colors, and details.
🎨 Midjourney is committed to improving image generation quality through AI technology, with the latest v6.1 model further optimizing image clarity and accuracy.
💡 The addition of the new editor will broaden the application of AI in the creative field, providing designers and artists with more flexible image manipulation and modification methods.

3. Viggle AI introduces new features allowing characters to speak through recorded voices

Viggle AI has launched an exciting new feature where users can make characters speak and achieve lip synchronization through voice recordings. This innovative technology allows users to fully control the performance of characters, whether singing or dancing, with ease. The Viggle app has garnered widespread attention on social media for its innovation, using the advanced JST-1 video 3D foundation model, allowing users to easily create and mix video content.

AiBase Highlights:
🎤 Character speaking feature: Users can make characters speak through voice recordings, achieving lip synchronization.
🎭 Character replacement feature: Users can place any character into a video scene, creating a personalized immersive experience.
🔄 Animates static images: Users can transform static photos into dynamic video, increasing the fun and interactivity of the video.
Details link: https://viggle.ai/home

4. Even top AI models struggle with complex travel planning, OpenAI o1-preview also finds it challenging

Recent studies show that even advanced AI language models, such as OpenAI's latest o1-preview, face challenges in complex planning tasks. The research finds that the models perform poorly in integrating rules and conditions and gradually lose focus on the problem as planning time increases. Although some models perform reasonably well in BlocksWorld, they struggle in more complex TravelPlanner tasks.

AiBase Highlights:
🌍 OpenAI's o1-preview and other AI models perform poorly in complex travel planning, with GPT-4o success rate at only 7.8%.
📉 Most models perform reasonably well in BlocksWorld but struggle to achieve ideal results in TravelPlanner.
🧠 Models have issues with insufficient integration of rules and loss of focus over time.
Details link: https://github.com/hsaest/Agent-Planning-Analysis

5. Open-source tool Vulnhuntr discovers Python zero-day vulnerabilities, cleverly utilizing Claude AI

Protect AI's Vulnhuntr tool uses Claude AI to help developers find zero-day vulnerabilities in Python code. Unlike traditional static analysis, this tool can track the complete call chain from user input to server output, improving the accuracy of vulnerability detection. Vulnhuntr has already discovered zero-day vulnerabilities in several large open-source projects and will soon be released on GitHub for developers to use.

AiBase Highlights:
🌟 Vulnhuntr is an open-source tool that uses Claude AI to discover Python zero-day vulnerabilities.
🛠️ The tool's working method differs from static analysis, as it can track the complete call chain.
🚀 Vulnhuntr has discovered zero-day vulnerabilities in multiple large open-source projects and will soon be released on GitHub.

6. ByteDance responds to "intern sabotaging large model training": No impact on formal commercial projects

ByteDance has recently issued an official response regarding rumors that an intern sabotaged large model training, confirming that the intern maliciously interfered with the research project model training but did not affect formal commercial projects and online services. The company states that the rumors are greatly exaggerated, has dismissed the intern, and reported the matter to relevant institutions. The incident exposes security management issues, and the company plans to heavily invest in AI technology.

AiBase Highlights:
🔍 The intern maliciously interfered with large model training, but it did not affect commercial projects and online services.
🔒 The company confirms that the rumors are exaggerated, has dismissed the intern, and reported the matter to relevant institutions.
💡 The incident exposes security management issues, and the company plans to heavily invest in AI technology.

7. Meta's latest black technology SPIRIT-LM: Can speak, write, and understand your emotions, this AI language model is quite powerful!

SPIRIT-LM is a revolutionary multimodal foundational language model that can freely mix text and speech, understand, and express emotions. It combines the semantic capabilities of text models and the expressive capabilities of speech models to complete cross-modal tasks, learning new tasks with just a few samples. SPIRIT-LM-EXPRESSIVE outperforms the base version in emotional expression, opening up new possibilities for multimodal language understanding and generation.

AiBase Highlights:
⚙️ SPIRIT-LM is a multimodal foundational language model that can mix text and speech and understand emotions.
🔑 SPIRIT-LM combines the semantic capabilities of text models and the expressive capabilities of speech models to complete cross-modal tasks.
💡 SPIRIT-LM-EXPRESSIVE outperforms the base version in emotional expression, opening up new possibilities for multimodal language understanding and generation.
Details link: https://arxiv.org/pdf/2402.05755

8. Overturning Stable Diffusion! Baidu's Emu3 is released, mastering images, text, and video!

The Emu3 team has released a set of new multimodal models, Emu3, which overturns traditional diffusion and combined model architectures, achieving state-of-the-art performance in generation and perception tasks. The model is trained based on the prediction of the next token, unifying multimodal tasks, surpassing specific task models, and even flagship models. The success of Emu3 points to the future development of multimodal models and brings new hope for achieving AGI.

AiBase Highlights:
🚀 Emu3 is trained based on the prediction of the next token, overturning traditional model architectures and achieving state-of-the-art performance.
💡 Emu3 unifies multimodal tasks without relying on diffusion or combined architectures, surpassing specific task models and flagship models.
🔗 The Emu3 team has open-sourced key technologies and models, providing support for further research in the field of multimodal intelligence.
Details link: https://github.com/baaivision/Emu3

9. Perplexity AI seeks a $90 billion valuation

Perplexity AI announced that it hopes to raise its valuation to $90 billion in a new round of financing, currently valued at $30 billion. The company faces plagiarism accusations but firmly denies them. In the intense market competition, it strives to improve its technology and service levels.

AiBase Highlights:
🌟 Perplexity AI plans to raise its valuation to $90 billion, attracting significant investor attention.
💰 The company has conducted three rounds of financing since the beginning of the year, developing rapidly.
📰 Faces plagiarism accusations, the company firmly denies and protects intellectual property.

10. Former OpenAI CTO launches new AI company with a fundraising target of over $100 million

Mira Murati is raising over $100 million in venture capital to start a new AI company. She left OpenAI to pursue personal exploration, and OpenAI raised a record $6.6 billion in venture capital after her departure. Looking forward to Murati's new company's future development.

AiBase Highlights:
✨ Mira Murati is raising over $100 million in venture capital to build a new AI company.
🚀 Murati left OpenAI to pursue personal exploration, with no specific plans disclosed.
📈 OpenAI raised a record $6.6 billion in venture capital after Murati's departure.

11. Apple's AI development lags two years behind, plans to introduce Apple Intelligence across all devices in the next two years

At this year's WWDC, Apple showcased new AI features, but analysts claim that Apple lags about two years behind competitors in AI technology development. Apple plans to launch "Apple Intelligence" features on all screen-equipped devices within the next two years. Despite starting late, the company is confident it can catch up.

AiBase Highlights:
📅 Apple lags about two years behind competitors in AI development and is working to catch up with industry standards.
💡 Apple plans to introduce "Apple Intelligence" features on all screen-equipped devices within the next two years.
📱 New iPads and upcoming iPhones will both feature hardware supporting "Apple Intelligence".

12. Beijing adds 12 generative AI service registrations, totaling 94

Beijing has recently added 12 generative AI service registrations, bringing the total number of registrations to 94, offering users more choices and convenience. AI applications that have gone live need to publicize their registration details, including model names and registration numbers. The newly added registration list includes Keling AI from Kuaishou Technology and Tiangong Image from Kunlun Wanwei Technology Co., Ltd.

AiBase Highlights:
📈 Beijing adds 12 generative AI service registrations, totaling 94
🔍 AI applications that have gone live need to publicize their registration details, including model names and registration numbers
📋 The newly added registration list includes Keling AI from Kuaishou Technology and Tiangong Image from Kunlun Wanwei Technology Co., Ltd.

AI Daily News

AI Daily: Alibaba Open Sources Document Model DocOwl 1.5; Midjourney Image Editor New Features Launching Next Week; Viggle AI Introduces Lip Sync Feature

站长之家

This article is from AIbase Daily