Before building the voice assistant, create a project workspace and set up an isolated UV environment. This keeps project dependencies separate from your system installation and makes it easier to reproduce the steps in the rest of the Learning Path.
These instructions support Ubuntu, macOS, and Windows, with Python 3.9 or later and a working microphone.
Install required system tools first. ffmpeg is required by Whisper for audio decoding.
# Ubuntu
sudo apt update
sudo apt install -y ffmpeg git cmake
# macOS
brew install ffmpeg git cmake
# Install via WinGet
winget install -e --id Gyan.FFmpeg
winget install -e --id Git.Git
winget install -e --id Kitware.CMake
Check your Python version before continuing:
python3 --version
py -3 --version
Install UV first so the uv command is available in your terminal. UV is a fast Python package and environment manager that you’ll use throughout this Learning Path to create the project environment and install dependencies.
curl -LsSf https://astral.sh/uv/install.sh | sh
# Start a new shell, or source your shell rc file so `uv` is on PATH.
uv --version
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# Open a new PowerShell window so `uv` is on PATH.
uv --version
Create and activate the project virtual environment:
mkdir -p ~/voice-sentiment-assistant
cd ~/voice-sentiment-assistant
uv venv .venv
source .venv/bin/activate
mkdir $HOME\voice-sentiment-assistant -Force
cd $HOME\voice-sentiment-assistant
uv venv .venv
.\.venv\Scripts\Activate.ps1
Keep this virtual environment activated while you complete the rest of the Learning Path.
Create a requirements.txt file for the packages used across the rest of the Learning Path:
gradio
openai-whisper
requests
torch
transformers
pandas
numpy
librosa
scikit-learn
onnx
onnxscript
onnxruntime
Install the dependencies into your active UV virtual environment:
uv pip install -r requirements.txt
This installs the libraries needed for the Gradio interface, Whisper transcription, model training, and ONNX Runtime inference. Some packages in this list are used later in the Learning Path when you optimize and export the sentiment model.
Next, clone the llama.cpp GitHub repository , build the local inference server, and start it. This server exposes an OpenAI-compatible API that the Python application will call later in the Learning Path.
If you prefer not to build from source, you can use pre-built binaries from
llama.cpp releases
. Download the package for your platform, extract it, and use the llama-server executable from that package in the run commands later in this section.
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release
When the build completes, the llama-server executable should be available in the build output directory.
ls ./build/bin/llama-server
The file verification commands in this Learning Path uses syntax for Ubuntu and macOS. If you’re on Windows, adjust the commands to use PowerShell equivalents like Test-Path .\build\bin\llama-server.exe or dir .\build\bin\. The file location remains the same across all platforms.
This Learning Path uses a quantized
Gemma 3 1B instruction-tuned model
served locally through llama.cpp.
The first time you run this command, llama.cpp will download the model from Hugging Face. This can take several minutes depending on your network connection.
Run the following command from the llama.cpp directory:
./build/bin/llama-server -hf ggml-org/gemma-3-1b-it-GGUF
.\build\bin\Release\llama-server.exe -hf ggml-org/gemma-3-1b-it-GGUF
Leave this terminal running while you test the application in later steps. The server listens on a local OpenAI-compatible endpoint that your app will call to generate responses.
In this section, you:
Your development environment is now ready with all tools needed for voice transcription, model training, and local LLM inference. In the next section, you’ll build the baseline voice-to-LLM pipeline using Gradio, Whisper, and llama.cpp.