AI Daily: New Models from Fudan and Baidu Can Generate 1-Hour Long Videos; Brand New ChatGPT Windows Version Launched; NotebookLM Introduces 2 New Features

Welcome to the AI Daily section! This is your daily guide to exploring the world of artificial intelligence. Each day, we bring you the hottest topics in the AI field, focusing on developers to help you understand technology trends and discover innovative AI product applications.

Fresh AI Products Click to Learn More: https://top.aibase.com/

1. For Paid Users! New ChatGPT Windows Version Launched: Shortcut Keys to Summon AI Assistant

OpenAI has released an early version of the new ChatGPT Windows app, offering paid users a convenient AI assistant experience. Users can summon ChatGPT by pressing Alt + Space, eliminating the need to open a web page each time. The app is currently only available to paid users but plans to extend the opportunity to free users after improvements. Although the beta version lacks some features, OpenAI promises continuous updates to enhance user experience.

AiBase Highlights:

🌟 The ChatGPT Windows app is exclusively available to paid users, supporting various paid account types.

💡 Press Alt + Space to easily summon ChatGPT for conversation, convenient and fast.

🔧 The beta app currently lacks some complex features but will be continuously updated for a better experience.

2. OpenAI Releases GPT-4O-Audio-Preview

OpenAI's latest GPT-4O-Audio-Preview model showcases remarkable capabilities in voice processing, not only generating natural and fluent voice responses but also featuring emotional analysis and voice interaction, opening new possibilities for human-machine interaction. The model supports various mode combinations and reflects the complexity of audio processing in its pricing strategy. Its launch will bring innovation to customer service, education, entertainment, and assistive technology fields.

AiBase Highlights:

🔊 The model has the ability to generate natural and fluent voice responses, suitable for voice assistant and virtual customer service applications.

🎶 It has the capability to analyze audio emotions, tones, and pitches, applicable in emotional computing and user experience analysis fields.

🗣 Supports voice-to-voice interaction, laying the foundation for comprehensive voice interaction systems.

Details Link: https://platform.openai.com/docs/guides/audio/quickstart

3. Google Upgrades AI Note-Taking and Research Assistant NotebookLM

Google announces significant upgrades to NotebookLM, enhancing the audio summary feature to allow users to more accurately guide AI in generating conversation content. Updates include custom audio summaries and background listening capabilities, improving user experience. A commercial pilot program is launched, anticipating broader application scenarios.

AiBase Highlights:

🔊 Audio summary feature upgraded, users can customize AI conversation content guidance.

🎙️ Added background listening feature, users can work and listen to audio simultaneously.

💼 Commercial pilot program launched, allowing enterprises to experience new features and receive support.

4. Fudan and Baidu Jointly Develop New AI Model Hallo2 for Generating 4K Ultra HD + 1-Hour Long Videos!

Fudan University and Baidu's jointly developed Hallo2 AI model will revolutionize character animation generation, bringing revolutionary changes to film production, virtual assistants, game development, and other fields. The model combines latent diffusion models, Patch-drop data augmentation techniques, Gaussian noise enhancement techniques, VQGAN discrete codebook prediction technology, and text prompt control mechanisms, excelling in generating high-quality, long-sequence character animations.

AiBase Highlights:

⚙️ Hallo2 model combines multiple innovative technologies, including Patch-drop data augmentation, Gaussian noise enhancement, VQGAN discrete codebook prediction, and text prompt control mechanisms.

🌟 Hallo2 outperforms existing methods on multiple public datasets, excelling in generating high-quality, long-sequence character animations.

🚀 The release of the Hallo2 model marks a new level in AI character animation generation technology, with future optimizations in efficiency and exploration of more application areas.

Details Link: https://fudan-generative-vision.github.io/hallo2/#/

5. Tesla's Optimus Robot Evolves: Autonomous Navigation, Climbing Stairs, and Interacting with Humans Become Reality

Tesla's latest Optimus robot showcases impressive new features, from autonomous navigation to human interaction, highlighting the rapid progress in artificial intelligence and robotics. Optimus's autonomous navigation capabilities, energy management autonomy, load capacity improvements, and other aspects show great potential.

AiBase Highlights:

🤖 Autonomous navigation: Optimus can navigate complex environments freely, and multiple robots can collaborate to optimize navigation efficiency.

🔋 Energy management autonomy: Optimus can automatically locate charging stations for self-charging, enhancing work continuity and efficiency.

🏋️‍♂️ Load capacity improvement: Optimus can carry battery trays weighing up to 11 kg, opening new possibilities for industrial and logistics applications.

6. Google's Major Leadership and Team Restructuring: Gemini Team Merged into DeepMind, Significant Changes in Search Leadership

Google has recently undergone significant leadership changes and team restructuring, including the K&I team and the Gemini team. The appointment of new leaders and team integration will have a major impact on the company's technological development and AI project collaboration.

AiBase Highlights:

🌟 Nick Fox takes over as the new head of Google's K&I team, continuing to drive the development of search, advertising, geography, and commerce products.

🔧 Prabhakar Raghavan transitions to Google's Chief Technology Officer, focusing on providing direction and support for the company's technological development.

🤖 The Gemini team is integrated with Google DeepMind, aiming to strengthen collaboration between application teams and the Gemini model team.

7. AMT-APC Algorithm: Generate Master-Level Piano Performances with One Click

Researchers at Musashino University's School of Data Science have developed the AMT-APC algorithm, which combines the AMT model and fine-tuning techniques to generate more accurate piano performances close to the original compositions. This algorithm overcomes the limitations of existing automatic piano music generation technologies, enhancing sound fidelity and expressiveness.

AiBase Highlights:

⭐ The AMT-APC algorithm leverages the advantages of the AMT model, generating piano performances closer to the original compositions through fine-tuning.

🎵 The core strategy includes pre-training and fine-tuning, enabling the AMT model to handle longer music segments and generate piano performances in line with the original style.

🎹 Introducing the concept of style vectors, learning different performance styles, enhancing the expressiveness and sound fidelity of generated piano music.

Details Link: https://misya11p.github.io/amt-apc/

8. Apple Siri AI New Features: ChatGPT Integration and Image Generation

Apple is working on new Apple Intelligence features for iOS18, iPadOS18, and macOS15, including ChatGPT integration and image generation. ChatGPT will provide Siri with more advanced text and image generation capabilities, while Visual Intelligence will offer camera control button features for iPhone16 users. iOS18.1, iPadOS18.1, and macOS Sequoia15.1 are expected to be released on October 28, with beta versions of iOS18.2, iPadOS18.2, and macOS Sequoia15.2 coming soon.

AiBase Highlights:

🔍 Siri will integrate ChatGPT, providing more advanced text and image generation capabilities.

📸 iPhone16 will receive Visual Intelligence features, offering information about surrounding objects through the camera control button.

🚀 iOS18.2 will support Image Playground image generation, Genmoji, and Image Wand.

9. Just One Billion Parameters! AI Image Generation Model Meissonic

Meissonic is an open-source AI model that generates high-quality images using just one billion parameters. It employs parallel iterative optimization training methods, making it 99% faster than traditional models in image generation. Despite its small parameter size, Meissonic outperforms larger models in multiple tests and can perform untrainable image inpainting and expansion functions.

AiBase Highlights:

🌟 Compact design Meissonic is suitable for ordinary gaming PCs and future mobile devices.

⚡ Uses parallel iterative optimization training methods, Meissonic is 99% faster than traditional models in image generation.

🏆 Despite its small parameter size, Meissonic outperforms larger models in multiple tests and can perform untrainable image inpainting and expansion functions.

Details Link: https://huggingface.co/spaces/MeissonFlow/meissonic

10. Perplexity Launches Internal Knowledge Search Feature, Enabling Enterprises to Query Both Internal and External Data Simultaneously

Perplexity has launched a new feature, "Internal Knowledge Search," aimed at improving enterprise work efficiency and making it easier for users to obtain the information they need. Users can upload selected files, avoiding low-value information interference in searches and enhancing efficiency. The new "Spaces" feature supports team file sharing and AI assistant customization.

AiBase Highlights:

📁 Users can only upload selected files, avoiding low-value information interference in searches and enhancing efficiency.

🔍 Perplexity introduces the "Internal Knowledge Search" feature, allowing users to query both internal and external data simultaneously.

🤝 New "Spaces" feature supports team file sharing and AI assistant customization.

11. Autonomous Driving Company Pony.ai Plans to Go Public in the US with a Valuation Exceeding $8.5 Billion

Pony.ai plans to go public in the US with a valuation exceeding $8.5 billion. Founded in 2016, the company focuses on autonomous driving solutions and has completed 9 rounds of financing exceeding $1 billion. Revenue is mainly from the Robotaxi business, which grew by 86% in the first half of 2024.

AiBase Highlights:

🌍 Pony.ai plans to go public in the US with the stock code "PONY," valued at over $8.5 billion.

💰 Founded in 2016, the company has completed 9 rounds of financing exceeding $1 billion, with a valuation of $8.5 billion.

🚖 The Robotaxi business is the main source of revenue, growing by 86% in the first half of 2024.

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

AI Daily: New Models from Fudan and Baidu Can Generate 1-Hour Long Videos; Brand New ChatGPT Windows Version Launched; NotebookLM Introduces 2 New Features

站长之家

This article is from AIbase Daily

AI News Recommendations

ChatGPT Mistake Triggers New Feature Development! Developers Helplessly Face a User Surge

ChatGPT Launches New 'Learn Together' Feature to Drive Transformation in the Education Sector

Microsoft Launches Deep Research: Integration of Bing and OpenAI to Revolutionize Automated Research

ChatGPT New Feature: Learn Together - The New Assistant for Future Education?

OpenAI Announces GPT-5 Will Integrate Multiple Models for a New Breakthrough

OpenAI Takes a Unique Approach with a Researcher Residency Program to Attract Emerging AI Talent

ChatGPT Helps Unlock the Decade-Old MTHFR Gene Mutation Mystery

Former OpenAI Researcher Reveals: Signing with Meta Did Not Bring $100 Million Bonus

New Developments in OpenAI Copyright Lawsuit: The New York Times Will Have Access to Deleted User Data

ChatGPT Helps News Websites Increase Traffic, But Struggles to Compensate for the Decline in Search Traffic