Submodule | Description |
---|---|
Conditioners | Includes a T5-based text encoder for the input prompt and a numerical duration encoder. These components convert the inputs into embeddings passed to the DiT model. |
Diffusion Transformer (DiT) | Denoises random noise over multiple steps to produce structured latent audio, guided by conditioner embeddings. |
AutoEncoder | Compresses audio waveforms into a latent representation for processing by the DiT model, and decompresses the output back into audio. |
The submodules work together to provide the pipeline as shown below:
As part of this section, you will covert each of the three submodules into LiteRT format, using two separate conversion routes:
The Conditioners submodule is made of the T5Encoder model. You will use the ONNX to TFLite conversion for this submodule.
To avoid dependency issues, create a virtual environment. In this guide, we will use virtualenv
:
cd $WORKSPACE
python3.10 -m venv env
source env/bin/activate
Clone the examples repository:
cd $WORKSPACE
git clone https://github.com/ARM-software/ML-examples/tree/main/kleidiai-examples/audiogen
cd audio-stale-open-litert
We now install the needed python packages for this, including onnx2tf and ai_edge_litert
bash install_requirements.sh
If you are using GPU on your machine, you may notice the following error:
Traceback (most recent call last):
File "$WORKSPACE/env/lib/python3.10/site-packages/torch/_inductor/runtime/hints.py",
line 46, in <module> from triton.backends.compiler import AttrsDescriptor
ImportError: cannot import name 'AttrsDescriptor' from 'triton.backends.compiler'
($WORKSPACE/env/lib/python3.10/site-packages/triton/backends/compiler.py)
.
ImportError: cannot import name 'AttrsDescriptor' from 'triton.compiler.compiler'
($WORKSPACE/env/lib/python3.10/site-packages/triton/compiler/compiler.py)
Install the following dependency and rerun the script:
pip install triton==3.2.0
bash install_requirements.sh
The Conditioners submodule is based on the T5Encoder model. We convert it first to ONNX, then to LiteRT.
For this conversion we include the following steps:
You can use the provided script to convert the Conditioners submodule:
python3 ./scripts/export_conditioners.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
After successful conversion, you now have a conditioners.onnx
model in your current directory.
To convert the DiT and AutoEncoder submodules, use the Generative API provided by the ai-edge-torch tools. This enables you to export a generative PyTorch model directly to tflite using three main steps:
Convert the DiT and AutoEncoder submodules using the provided python script:
CUDA_VISIBLE_DEVICES="" python3 ./scripts/export_dit_autoencoder.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
After successful conversion, you now have dit_model.tflite
and autoencoder_model.tflite
models in your current directory and can deactivate the virtual environment:
deactivate
For easier access, we add all needed models to one directory:
export LITERT_MODELS_PATH=$WORKSPACE/litert-models
mkdir $LITERT_MODELS_PATH
cp conditioners_tflite/conditioners_float32.tflite $LITERT_MODELS_PATH
cp dit_model.tflite $LITERT_MODELS_PATH
cp autoencoder_model.tflite $LITERT_MODELS_PATH
With all three submodules converted to LiteRT format, you’re ready to build LiteRT and run the model on a mobile device in the next step.