Stable Audio Open Small Model

SubmoduleDescription
ConditionersIncludes a T5-based text encoder for the input prompt and a numerical duration encoder. These components convert the inputs into embeddings passed to the DiT model.
Diffusion Transformer (DiT)Denoises random noise over multiple steps to produce structured latent audio, guided by conditioner embeddings.
AutoEncoderCompresses audio waveforms into a latent representation for processing by the DiT model, and decompresses the output back into audio.

The submodules work together to provide the pipeline as shown below: Image Alt Text:Model structure

As part of this section, you will covert each of the three submodules into LiteRT format, using two separate conversion routes:

  1. Conditioners submodule - ONNX to LiteRT using onnx2tf tool.
  2. DiT and AutoEncoder submodules - PyTorch to LiteRT using Google AI Edge Torch tool.

Create virtual environment and install dependencies

The Conditioners submodule is made of the T5Encoder model. You will use the ONNX to TFLite conversion for this submodule.

To avoid dependency issues, create a virtual environment. In this guide, we will use virtualenv:

    

        
        
cd $WORKSPACE
python3.10 -m venv env
source env/bin/activate

    

Clone the examples repository:

    

        
        
cd $WORKSPACE
git clone https://github.com/ARM-software/ML-examples/tree/main/kleidiai-examples/audiogen
cd audio-stale-open-litert

    

We now install the needed python packages for this, including onnx2tf and ai_edge_litert

    

        
        
bash install_requirements.sh

    
Tip

If you are using GPU on your machine, you may notice the following error:

    

        
        
Traceback (most recent call last):
  File "$WORKSPACE/env/lib/python3.10/site-packages/torch/_inductor/runtime/hints.py",
  line 46, in <module> from triton.backends.compiler import AttrsDescriptor
ImportError: cannot import name 'AttrsDescriptor' from 'triton.backends.compiler'
($WORKSPACE/env/lib/python3.10/site-packages/triton/backends/compiler.py)
.
ImportError: cannot import name 'AttrsDescriptor' from 'triton.compiler.compiler'
($WORKSPACE/env/lib/python3.10/site-packages/triton/compiler/compiler.py)

    

Install the following dependency and rerun the script:

    

        
        
pip install triton==3.2.0
bash install_requirements.sh

    

Convert Conditioners Submodule

The Conditioners submodule is based on the T5Encoder model. We convert it first to ONNX, then to LiteRT.

For this conversion we include the following steps:

  1. Load the Conditioners submodule from the Stable Audio Open model configuration and checkpoint.
  2. Export the Conditioners submodule to ONNX via torch.onnx.export().
  3. Convert the resulting ONNX file to LiteRT using onnx2tf.

You can use the provided script to convert the Conditioners submodule:

    

        
        
python3 ./scripts/export_conditioners.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"

    

After successful conversion, you now have a conditioners.onnx model in your current directory.

Convert DiT and AutoEncoder

To convert the DiT and AutoEncoder submodules, use the Generative API provided by the ai-edge-torch tools. This enables you to export a generative PyTorch model directly to tflite using three main steps:

  1. Model re-authoring.
  2. Quantization.
  3. Conversion.

Convert the DiT and AutoEncoder submodules using the provided python script:

    

        
        
CUDA_VISIBLE_DEVICES="" python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"

    

After successful conversion, you now have dit_model.tflite and autoencoder_model.tflite models in your current directory and can deactivate the virtual environment:

    

        
        
deactivate

    

For easier access, we add all needed models to one directory:

    

        
        
export LITERT_MODELS_PATH=$WORKSPACE/litert-models
mkdir $LITERT_MODELS_PATH
cp conditioners_tflite/conditioners_float32.tflite $LITERT_MODELS_PATH
cp dit_model.tflite $LITERT_MODELS_PATH
cp autoencoder_model.tflite $LITERT_MODELS_PATH

    

With all three submodules converted to LiteRT format, you’re ready to build LiteRT and run the model on a mobile device in the next step.

Back
Next