NVIDIA Releases 'Sound Magic Wand' Fugatto: Control Music with Text!

AIbase基地

Published inAI News · 6 min read · Nov 26, 2024

209

Do you remember the scenes in sci-fi movies where the protagonist waves a magic wand to control sounds at will? Now, this magical ability is no longer a fantasy! NVIDIA's latest AI model, Fugatto, acts like a "sound magic wand," allowing users to manipulate music, sounds, and voices using just text, creating a variety of amazing auditory effects.

Fugatto, short for "Foundational Generative Audio Transformer Opus1," is an audio processing model based on generative AI technology. Unlike other AI models that can only create music or modify speech, Fugatto has more powerful capabilities, allowing it to generate or transform a blend of music, speech, and sounds. It can also understand and execute commands provided by users through text and audio files.

Fugatto's powerful features have amazed users from various fields, including music producers, advertising agencies, language learning tool developers, and game developers. Music producers can quickly experiment with different musical styles, vocals, and instruments, even adding effects or enhancing the quality of existing songs. Advertising agencies can use it to add different accents and emotions to voiceovers, effortlessly promoting ads to different regions and target audiences. Language learning tool developers can utilize Fugatto to convert course content into any voice the user desires, such as that of family or friends, making learning more personalized. Game developers can leverage Fugatto to modify sound materials in real-time based on game progress or create entirely new game sound effects based on text commands and audio inputs.

The magic of Fugatto lies in its ability to understand and generate sounds like a human. It can execute specific commands given by users and create unprecedented new sounds. For instance, it can make a trumpet sound like a dog barking or a saxophone imitate a cat meowing; as long as the user can describe it, Fugatto can create it.

Audio Sound Waves

Image Source Note: Image generated by AI, image authorized by service provider Midjourney

Another groundbreaking ability of Fugatto is its capacity to combine commands learned separately during training to generate more complex effects. For example, users can request it to generate a voice with a sad emotion in a French accent. Even more astonishing, Fugatto allows users to make subtle adjustments to commands, such as controlling the intensity of the accent or the strength of the sad emotion, enabling users to create like artists.

Fugatto can also generate sounds that change over time, such as a storm approaching from afar, with thunder gradually intensifying and then slowly fading away. Users can precisely control the process of sound variation, creating a variety of vivid sound effects.

Fugatto is a product developed collaboratively by researchers from around the globe, with team members from countries like India, Brazil, China, Jordan, and South Korea. Their diverse backgrounds give Fugatto stronger capabilities in handling multiple accents and languages.

The birth of Fugatto is the culmination of years of research by NVIDIA in fields such as speech modeling, audio coding, and audio understanding. It utilizes 2.5 billion parameters and was trained on an NVIDIA DGX system cluster equipped with 32 NVIDIA H100 Tensor Core GPUs.

The emergence of Fugatto marks a new era in audio processing technology. It will bring limitless possibilities to various fields such as music, film, gaming, and education. Let's look forward to it creating even more amazing auditory feasts!

Official Blog: https://blogs.nvidia.com/blog/fugatto-gen-ai-sound-model/

Fugatto Generative AI Audio Processing Model

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

iFlytek Medical Officially Launches Spark Medical Large Model V3.5

The AI healthcare industry reaches a critical turning point as iFlytek Medical launches Spark Medical Large Model V3.5 on June 9, trained on domestic computing power. The model shifts focus from parameter scale to core scenarios of clinical diagnosis and resident health management, demonstrating practical clinical application through data from top-tier hospitals.....

Jun 12, 2026

320

Central Cyberspace Administration Office Launches AI Chaos Reporting Section, Clearly Defines 14 Categories of Reportable Issues

In support of the 'Clear and Bright - Rectification of AI Application Chaos' special action, the Central Cyberspace Administration Office's Reporting Center has launched an AI application chaos reporting section this month, opening up reporting channels for the public. The section accepts 14 categories of violations, divided into two major categories: AI application service violations (such as unregistered large models, insufficient security reviews, risks in training data, data poisoning, and content fabrication) and AI-generated content violations, aiming to standardize AI product services and protect online users' rights.

Jun 12, 2026

210

Tongyi Qianwen Launches Football Prediction AI Assistant, Accurately Predicting Red Cards and Winning Goals?

At the opening of the 2026 US-Mexico-Canada World Cup, the Tongyi Qianwen App launched a football prediction AI assistant on June 12th. The assistant accurately predicted the opening match, Mexico 2:0 victory over South Africa and the red card trends on the first day, and also predicted South Korea's 2:1 win against Czech Republic, which was called the AI version of the octopus by netizens. The assistant is based on massive data training and introduces innovative factors such as the host country, demonstrating high accuracy.

Jun 12, 2026

340

Joint Research Reveals: AI Agents Dramatically Transform Knowledge Work Models, with Significant Efficiency and Cost Advantages

Perplexity jointly released a report with the Harvard Business School, comparing the Perplexity Computer general AI agent with traditional search assistants. Traditional assistants only answer questions and require users to perform follow-up actions manually, while AI agents can autonomously plan, execute tasks, and produce results. Data shows that Perplexity Computer's AI agent runs autonomously for an average of 26 minutes per session, far exceeding traditional search assistants, demonstrating the comprehensive transformation of knowledge work by AI agents.

Jun 12, 2026

270

AI Daily: Gaode Wenda Launches AI Capability Open Call; Meituan Cracks Down on AI-Generated Comments; Wanxiang Yousheng Launches Automatic AI Multi-Channel Audiobook Creation

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Fresh AI products click to learn more: https://app.aibase.com/zh1, Let small stores use the intelligent brain of large chains, Gaode Wenda launches AI capability open call. Gaode Cloud Map announced the opening of the business intelligent agent ecosystem and launched the 'Gaode Wenda' public test, providing big chain store-level AI capabilities to small and micro merchants.

Jun 12, 2026

740

The Edge of Control: Initial Experience with Claude Fable5's Autonomous Debugging

The AI programming tool Claude Fable5 demonstrated amazing autonomy. Technical blogger Simon Willison sent only a screenshot and a brief instruction "check dependencies and find the problem," and the AI independently fixed the page scrollbar bug in Datasette Agent, astonishingly showcasing its independent execution capabilities.

Jun 12, 2026

280

Goldman Sachs Releases AI Industry Report: Market Underestimates AI Demand, Token Consumption May Surge 24 Times by 2030

Goldman Sachs' latest report suggests the market underestimates AI development needs, forecasting hyperscale data center operators' AI-related capital expenditure to far exceed expectations. Wall Street estimates $920 billion by 2027, while Goldman Sachs calculates $1.1 trillion, potentially rising to $1.4 trillion in an optimistic scenario. Driven by enterprise AI agent adoption, global AI token consumption is projected to surge significantly by....

Jun 12, 2026

310

Yuchengdong: There Is No Second in the Large Model Battlefield, Huawei's PanGu Has Fully Advanced to Version 2.0

At HDC 2026, Huawei launched the open-source PanGu Large Model 2.0. Yuchengdong stated that Huawei is one of the earliest pioneers in large model layout globally. Despite some challenges, research and development was initiated before National Day last year, demonstrating its firm commitment and new strategy in the AI field.

Jun 12, 2026

380

Moonshot AI Kimi Plans to Issue an AI-Native Credit Card with Integrated Computing Power Services

Moonshot AI (Kimi) is in talks with state-owned banks and international card organizations to launch an AI-native credit card that integrates payment, credit, and AI computing power value-added services, marking the expansion of large models from online to physical financial scenarios.

Jun 12, 2026

400

Dazhong Dianping Invests in Content Ecosystem Construction and Strengthens AIGC Fake Review Governance

Dazhong Dianping announced that it will strengthen the construction of a genuine content ecosystem, introducing more incentive measures to encourage users to share original and objective consumer reviews, in response to the challenges posed by AIGC content. At the same time, the platform has upgraded its AIGC review governance intelligent agent, using technological means to efficiently identify and remove AI-generated low-quality reviews, ensuring the authenticity of content.

Jun 12, 2026

290

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

NVIDIA Releases 'Sound Magic Wand' Fugatto: Control Music with Text!

AIbase基地

This article is from AIbase Daily

AI News Recommendations

iFlytek Medical Officially Launches Spark Medical Large Model V3.5

Central Cyberspace Administration Office Launches AI Chaos Reporting Section, Clearly Defines 14 Categories of Reportable Issues

Tongyi Qianwen Launches Football Prediction AI Assistant, Accurately Predicting Red Cards and Winning Goals?

Joint Research Reveals: AI Agents Dramatically Transform Knowledge Work Models, with Significant Efficiency and Cost Advantages

AI Daily: Gaode Wenda Launches AI Capability Open Call; Meituan Cracks Down on AI-Generated Comments; Wanxiang Yousheng Launches Automatic AI Multi-Channel Audiobook Creation

The Edge of Control: Initial Experience with Claude Fable5's Autonomous Debugging

Goldman Sachs Releases AI Industry Report: Market Underestimates AI Demand, Token Consumption May Surge 24 Times by 2030

Yuchengdong: There Is No Second in the Large Model Battlefield, Huawei's PanGu Has Fully Advanced to Version 2.0

Moonshot AI Kimi Plans to Issue an AI-Native Credit Card with Integrated Computing Power Services

Dazhong Dianping Invests in Content Ecosystem Construction and Strengthens AIGC Fake Review Governance

AI News Recommendations

iFlytek Medical Officially Launches Spark Medical Large Model V3.5

Central Cyberspace Administration Office Launches AI Chaos Reporting Section, Clearly Defines 14 Categories of Reportable Issues

Tongyi Qianwen Launches Football Prediction AI Assistant, Accurately Predicting Red Cards and Winning Goals?

Joint Research Reveals: AI Agents Dramatically Transform Knowledge Work Models, with Significant Efficiency and Cost Advantages

AI Daily: Gaode Wenda Launches AI Capability Open Call; Meituan Cracks Down on AI-Generated Comments; Wanxiang Yousheng Launches Automatic AI Multi-Channel Audiobook Creation

The Edge of Control: Initial Experience with Claude Fable5's Autonomous Debugging

Goldman Sachs Releases AI Industry Report: Market Underestimates AI Demand, Token Consumption May Surge 24 Times by 2030

Yuchengdong: There Is No Second in the Large Model Battlefield, Huawei's PanGu Has Fully Advanced to Version 2.0

Moonshot AI Kimi Plans to Issue an AI-Native Credit Card with Integrated Computing Power Services

Dazhong Dianping Invests in Content Ecosystem Construction and Strengthens AIGC Fake Review Governance