2024-05-04
Generative AI is simply amazing:
Applies to:
At the highest level, Stable Audio is a Latent Diffusion Model consisting of:
Acronyms
These values are:
During inference, seconds_start and seconds_total serve as conditioning variables, enabling users to generate variable-length outputs.
When training with audio files shorter than the training window, padding with silence is used up to the training window length.
Suppose we take a 95-sec chunk from a 180-sec audio file, with the chunk starting 14-sec in, then:
806,284 Audio Samples totaling over 19,500 hours, consisting of:
With corresponding text metadata from AudioSparx.
Human Ratings Collected:
Stable Audio VAE Reconstruction Demo