Imagine typing “dramatic intro music” and hearing a soaring symphony or writing “creepy footsteps” and getting high-quality sound effects. That’s the promise of Stable Audio, a text-to-audio AI model announced Wednesday by Stability AI that can synthesize music or sounds from written descriptions. Before long, similar technology may challenge musicians for their jobs.
If you’ll recall, Stability AI is the company that helped fund the creation of Stable Diffusion, a latent diffusion image synthesis model released in August 2022. Not content to limit itself to generating images, the company branched out into audio by backing Harmonai, an AI lab that launched music generator Dance Diffusion in September.
Now Stability and Harmonai want to break into commercial AI audio production with Stable Audio. Judging by production samples, it seems like a significant audio quality upgrade from previous AI audio generators we’ve seen.