First, install Python 3.11 and create a virtual environment on the Google Axion virtual machine (VM) running SUSE Linux.
Verify that the VM is running on Arm64 architecture:
uname -m
The output is similar to:
aarch64
Check CPU details:
lscpu
The output is similar to:
Architecture: aarch64
CPU op-mode(s): 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: ARM
Model name: Neoverse-V2
The Neoverse-V2 model name confirms you’re running on a Google Axion processor. The aarch64 architecture confirms the 64-bit Arm environment that PyTorch and DeepSpeed will target.
The default Python version on SUSE Linux might conflict with PyTorch and DeepSpeed dependencies. Python 3.11 provides stable support for both frameworks and avoids compatibility issues commonly seen with older or newer releases:
sudo zypper install -y python311 python311-pip python311-devel
Create an isolated Python environment to prevent dependency conflicts with system packages:
python3.11 -m venv deepspeed-env
Activate the virtual environment:
source ~/deepspeed-env/bin/activate
Verify the Python version in the environment:
python --version
The output is similar to:
Python 3.11.10
Upgrade pip, setuptools, and wheel before installing packages. Outdated packaging tools can cause installation failures or wheel compatibility issues, particularly on Arm64:
pip install --upgrade pip setuptools wheel
Ninja is a lightweight build system used by PyTorch and DeepSpeed to compile native extensions at runtime.
To avoid SUSE repository dependency issues sometimes seen on cloud Arm64 images, install Ninja using pip rather than zypper:
pip install ninja
Verify the installation:
ninja --version
The output is similar to:
1.13.0.git.kitware.jobserver-pipe-1
After setting up the Python environment, install PyTorch and DeepSpeed on the VM.
Google Axion VMs are CPU-only systems and don’t contain NVIDIA GPUs. To avoid unnecessary CUDA dependencies and reduce package size, install the CPU-only PyTorch build:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Verify that you installed PyTorch successfully:
python -c "import torch; print(torch.__version__)"
The output is similar to:
2.12.0+cpu
Check CUDA availability:
python -c "import torch; print(torch.cuda.is_available())"
The output is similar to:
False
This is expected because Google Axion VMs are CPU-only systems.
DeepSpeed’s distributed CPU extensions require GCC 9 or later to compile. The default SUSE Linux image on Google Axion ships with GCC 7.5.0. When DeepSpeed initializes its launcher, it attempts to compile the deepspeed_shm_comm shared memory communication extension. This compilation fails on GCC 7.5.0.
To work around this, install DeepSpeed with all native extension compilation disabled. Each variable tells the build system to skip a specific extension that requires GCC 9 or later:
| Variable | Purpose |
|---|---|
DS_BUILD_OPS=0 | Disables native op compilation |
DS_BUILD_SHM_COMM=0 | Disables the shared memory communication extension |
DS_BUILD_CPU_ADAM=0 | Disables the CPU Adam optimizer extension |
DS_BUILD_AIO=0 | Disables async I/O extensions |
DS_BUILD_OPS=0 DS_BUILD_SHM_COMM=0 DS_BUILD_CPU_ADAM=0 DS_BUILD_AIO=0 pip install deepspeed
Verify that DeepSpeed was installed successfully:
ds_report
The output is similar to:
[WARNING] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented [NO] ....... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-devel package with yum
async_io ............... [NO] ....... [NO]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
deepspeed_shm_comm ..... [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/user/deepspeed-env/lib64/python3.11/site-packages/torch']
torch version .................... 2.12.0+cpu
deepspeed install path ........... ['/home/user/deepspeed-env/lib64/python3.11/site-packages/deepspeed']
deepspeed info ................... 0.19.0, unknown, unknown
deepspeed wheel compiled w. ...... torch 0.0
shared memory (/dev/shm) size .... 7.80 GB
The CPU accelerator warning is expected because Google Axion VMs have no GPU. Most ops show [NO] ... [OKAY], meaning they are not pre-installed but are compatible for just-in-time compilation with Ninja if needed at runtime. The one exception is async_io, which shows [NO] ... [NO] because it requires the libaio-devel system package. Because async I/O isn’t needed for the training workloads in this Learning Path, and it was disabled with DS_BUILD_AIO=0, you can ignore this warning.
Create a working directory for your DeepSpeed training scripts:
mkdir ~/deepspeed-demo
cd ~/deepspeed-demo
Don’t run deepspeed train.py directly on this VM. DeepSpeed’s launcher attempts to compile the deepspeed_shm_comm native extension during initialization, which requires GCC 9 or later. Use python train.py instead, as shown in the next section.
Use the following guidance to troubleshoot issues with setting up the Python environment for the project.
You might see the following error during zypper commands:
Receive: script died unexpectedly
If Python 3.11 is already installed when this occurs, you can continue. Install all remaining packages using pip inside the virtual environment and avoid relying on SUSE development repositories.
You’ve now installed Python 3.11, PyTorch, and DeepSpeed on a Google Axion C4A VM running SUSE Linux, verified the environment with ds_report, and created the project directory for training scripts.
Next, you’ll create and run neural network training and benchmarking workloads on the VM.