Stability AI, known for its Stable Diffusion text-to-image model, has partnered with Arm, a global semiconductor leader, to bring AI-powered audio generation to mobile devices. This collaboration allows the Stable Audio Open model to run entirely on Arm CPUs, enabling users to quickly generate sound effects, audio samples, and production elements on their devices without an internet connection.
Stability AI states that as generative AI becomes increasingly prevalent among businesses and professional creators, ensuring our models and workflows are readily accessible across creative fields is crucial. This not only boosts creative efficiency but also helps seamlessly integrate these technologies into visual media production pipelines.
Addressing growing demand, the company aims to improve its model's efficiency on edge devices. During optimization of the Stable Audio Open model for mobile devices, initial tests showed audio generation taking 240 seconds on an Arm CPU device. Through model distillation and leveraging Arm's software stack, particularly the int8 matrix multiplication kernel in XNNPack via KleidiAI, the company successfully reduced the time to generate an 11-second audio clip to 8 seconds – a 30x speed improvement.
It's important to note that users will need a compatible mobile device to experience this functionality. Given that most smartphones today are equipped with Arm-based CPUs, this technology is readily accessible to a wide range of users. In the future, Stability AI plans to bring all its models across image, video, and 3D to edge devices, aiming to revolutionize visual media creation on mobile.
Key Highlights:
🌟 Stability AI and Arm have partnered to deliver offline audio generation on mobile devices.
⚡ Through model distillation and software optimization, audio generation time has been reduced from 240 seconds to 8 seconds – a 30x efficiency improvement.
📱 This technology works on most smartphones with Arm CPUs and will expand to more media creation fields in the future.