Introduction
Overview
Set up your SME2 development environment
Test your SME2 development environment
Streaming mode and ZA state in SME
Vanilla matrix multiplication
Outer product
SME2 assembly matrix multiplication
Matrix multiplication using SME2 intrinsics in C
Benchmarking
Debugging
Going further
Next Steps
To build or run SME2-accelerated code, first set up your development environment. This section walks you through the required tools and two supported setup options:
Native SME2 hardware - build and run directly on a system with SME2 support, see Devices with native SME2 support
Docker-based emulation - use a container to emulate SME2 in bare metal mode (without an OS)
To get started, begin by downloading the code examples .
Now extract the archive, and change directory to:
code-examples/learning-paths/cross-platform/multiplying-matrices-with-sme2.
tar xfz code-examples-main-learning-paths-cross-platform-multiplying-matrices-with-sme2.tar.gz -s /code-examples-main-learning-paths-cross-platform-multiplying-matrices-with-sme2/code-examples/
cd code-examples/learning-paths/cross-platform/multiplying-matrices-with-sme2
The directory structure should look like this:
code-examples/learning-paths/cross-platform/multiplying-matrices-with-sme2/
├── .clang-format
├── .devcontainer/
│ └── devcontainer.json
├── .git/
├── .gitignore
├── Makefile
├── README.rst
├── docker/
│ ├── assets.source_me
│ ├── build-all-containers.sh
│ ├── build-my-container.sh
│ └── sme2-environment.docker
├── hello.c
├── main.c
├── matmul.h
├── matmul_asm.c
├── matmul_asm_impl.S
├── matmul_intr.c
├── matmul_vanilla.c
├── misc.c
├── misc.h
├── preprocess_l_asm.S
├── preprocess_vanilla.c
├── run-fvp.sh
└── sme2_check.c
Among other files, it includes:
Makefile
to build the code.run-fvp.sh
to run the FVP model.docker
directory containing:assets.source_me
to provide toolchain paths.build-my-container.sh
, a script that automates building the Docker image from the sme2-environment.docker
file. It runs the Docker build command with the correct arguments so you don’t have to remember them.sme2-environment.docker
, a custom Docker file that defines the steps to build the SME2 container image. It installs all the necessary dependencies, including the SME2-compatible compiler and Arm FVP emulator.build-all-containers.sh
, a script to build multi-architecture images..devcontainer/devcontainer.json
for VS Code container support.From this point, all instructions assume that your current directory is
code-examples/learning-paths/cross-platform/multiplying-matrices-with-sme2
, so ensure that you are in the correct directory before proceeding.
To run SME2 code natively, ensure your system includes SME2 hardware and uses a compiler version that supports SME2.
For the compiler, you can use
Clang
version 18 or later, or
GCC
version 14 or later. This Learning Path uses clang
.
At the time of writing, macOS ships with clang
version 17.0.0, which doesn’t support SME2. Use a newer version, such as 20.1.7, available through Homebrew.
You can check your compiler version using the command:clang --version
Install Clang using the instructions below, selecting either macOS or Linux/Ubuntu, depending on your setup:
sudo apt install clang
brew install llvm
You are now all set to start hacking with SME2.
If your machine doesn’t support SME2, or you want to emulate it, you can use the Docker-based environment.
The Docker container includes both a compiler and Arm’s Fixed Virtual Platform (FVP) model for emulating code that uses SME2 instructions. You can either run the prebuilt container image provided in this Learning Path or build it yourself using the Docker file that is included.
If building manually, follow the instructions in the sme2-environment.docker
file to install the required tools on your machine.
Docker is optional, but if you don’t use it, you must manually install the compiler and FVP, and ensure they’re in your PATH
.
To begin, start by checking that Docker is installed on your machine:
docker --version
__output__Docker version 27.3.1, build ce12230
If the above command fails with an error message similar to “docker: command not found
”, then follow the steps from the
Docker install guide
to install Docker.
You might need to log out and back in again or restart your machine for the changes to take effect.
Once you have confirmed that Docker is installed on your machine, you can check that it is working with the following:
docker run hello-world
__output__Unable to find image 'hello-world:latest' locally
__output__latest: Pulling from library/hello-world
__output__c9c5fd25a1bd: Pull complete
__output__Digest: sha256:940c619fbd418f9b2b1b63e25d8861f9cc1b46e3fc8b018ccfe8b78f19b8cc4f
__output__Status: Downloaded newer image for hello-world:latest
__output__
__output__Hello from Docker!
__output__This message shows that your installation appears to be working correctly.
__output__
__output__To generate this message, Docker took the following steps:
__output__ 1. The Docker client contacted the Docker daemon.
__output__ 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
__output__ (arm64v8)
__output__ 3. The Docker daemon created a new container from that image which runs the
__output__ executable that produces the output you are currently reading.
__output__ 4. The Docker daemon streamed that output to the Docker client, which sent it
__output__ to your terminal.
__output__
__output__To try something more ambitious, you can run an Ubuntu container with:
__output__ $ docker run -it ubuntu bash
__output__
__output__Share images, automate workflows, and more with a free Docker ID:
__output__ https://hub.docker.com/
__output__
__output__For more examples and ideas, visit:
__output__ https://docs.docker.com/get-started/
You can use Docker in the following ways:
Directly from the command line - for example, when you are working from a terminal on your local machine.
Within a containerized environment - by configuring VS Code to execute all the commands inside a Docker container, allowing you to work seamlessly within the Docker environment.
When a command is executed in the Docker container environment, you must prepend it with instructions on the command line so that your shell executes it within the container.
For example, to execute COMMAND ARGUMENTS
in the SME2 Docker container, the
command line looks like this:
docker run --rm -v "$PWD:/work" -w /work armswdev/sme2-learning-path:sme2-environment-v2 COMMAND ARGUMENTS
This invokes Docker, using the
armswdev/sme2-learning-path:sme2-environment-v2
container image, and mounts
the current working directory (the
code-examples/learning-paths/cross-platform/multiplying-matrices-with-sme2
)
inside the container to /work
, then sets /work
as the working directory
and runs COMMAND ARGUMENTS
in this environment.
For example, to run make
, you need to enter:
docker run --rm -v "$PWD:/work" -w /work armswdev/sme2-learning-path:sme2-environment-v2 make
The standard docker run
commands can be long and repetitive. To streamline your workflow, you can start an interactive Docker session that allows you to run commands directly - without having to prepend docker run each time.
To launch an interactive shell inside the container, use the -it
flag:
docker run --rm -it -v "$PWD:/work" -w /work armswdev/sme2-learning-path:sme2-environment-v2
You are now in the Docker container, and you can execute all commands directly. For
example, the make
command can now be simply invoked with:
make
To exit the container, simply hit CTRL+D. Note that the container is not persistent (it was invoked with --rm
), so each invocation will use a container freshly built from the image. All the files reside outside the container, so changes you make to them will be persistent.
If you are using Visual Studio Code as your IDE, the container setup is already configured with devcontainer/devcontainer.json
.
Make sure you have the Microsoft Dev Containers extension installed.
Then select the Reopen in Container menu entry as shown below.
It automatically finds and uses .devcontainer/devcontainer.json
:
Figure 1: Setting up the Docker container.
All your commands now run within the container, so there is no need to prepend them with a Docker invocation, as VS Code handles all this seamlessly for you.
For the rest of this Learning Path, shell commands include the full Docker
invocation so that if you are not using VS Code you can copy the complete command line.
However, if you are using VS Code, you only need to use the COMMAND ARGUMENTS
part.
These Apple devices support SME2 natively.
Device | Release Date | Chip Options |
---|---|---|
iPhone 16 | 2024 | A18 |
iPad Pro (7th generation) | 2024 | M4 |
iMac (2024) | 2024 | M4 |
Mac Mini (2024) | 2024 | M4, M4 Pro, M4 Max |
MacBook Pro (14-inch, 16-inch, 2024) | 2024 | M4 Pro, M4 Max |
MacBook Air (2025) | 2025 | M4 |