Introduction
Overview
Set up your SME2 development environment
Test your SME2 development environment
Streaming mode and ZA state in SME
Vanilla matrix multiplication
Outer product
SME2 assembly matrix multiplication
Matrix multiplication using SME2 intrinsics in C
Benchmarking
Debugging
Going further
Next Steps
In this section, you’ll verify that your environment is ready for SME2 development. This is your first hands-on task and confirms that the toolchain, hardware (or emulator), and compiler are set up correctly.
Make sure your current working directory is code-examples/learning-paths/cross-platform/multiplying-matrices-with-sme2.
Use the cmake command to configure the project. Note that for native builds,
you may have (as shown in the example) to tell cmake which clang to use as
it would otherwise find the default one from the system (which might not be
suitable). If you system clang is recent enough, omit the CC=...
part of the cmake invocation.
CC=/opt/homebrew/Cellar/llvm/21.1.4/bin/clang cmake -G Ninja -S . -B build-native -DCMAKE_BUILD_TYPE:STRING=Release
__output__-- The C compiler identification is Clang 21.1.4
__output__-- The ASM compiler identification is Clang with GNU-like command-line
__output__-- Found assembler: /opt/homebrew/Cellar/llvm/21.1.4/bin/clang
__output__-- Detecting C compiler ABI info
__output__-- Detecting C compiler ABI info - done
__output__-- Check for working C compiler: /opt/homebrew/Cellar/llvm/21.1.4/bin/clang - skipped
__output__-- Detecting C compile features
__output__-- Detecting C compile features - done
__output__-- Configuring done (0.8s)
__output__-- Generating done (0.0s)
__output__-- Build files have been written to: .../multiplying-matrices-with-sme2/build-native
cmake -G Ninja -S . -B build-android -DCMAKE_BUILD_TYPE:STRING=Release -DCMAKE_TOOLCHAIN_FILE:STRING="$NDK/build/cmake/android.toolchain.cmake" -DANDROID_ABI:STRING=arm64-v8a -DANDROID_PLATFORM:STRING=android-24 -DANDROID_STL:STRING=c++_static -DCMAKE_BUILD_TYPE:STRING=Release
__output__-- The C compiler identification is Clang 21.0.0
__output__-- The ASM compiler identification is Clang with GNU-like command-line
__output__-- Found assembler: .../Library/Android/sdk/ndk/29.0.14206865/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang
__output__-- Detecting C compiler ABI info
__output__-- Detecting C compiler ABI info - done
__output__-- Check for working C compiler: .../Library/Android/sdk/ndk/29.0.14206865/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang - skipped
__output__-- Detecting C compile features
__output__-- Detecting C compile features - done
__output__-- Configuring done (1.1s)
__output__-- Generating done (0.0s)
__output__-- Build files have been written to: .../multiplying-matrices-with-sme2/build-android
docker run --rm -v "$PWD:/work" armswdev/sme2-learning-path:sme2-environment-v3 cmake -G Ninja -S . -B build-baremetal -DCMAKE_TOOLCHAIN_FILE:STRING=cmake/baremetal-toolchain.cmake -DCMAKE_BUILD_TYPE:STRING=Release
__output__-- Using ATfE from: /tools/ATfE-21.1.1-Linux-AArch64
__output__-- Using ATfE from: /tools/ATfE-21.1.1-Linux-AArch64
__output__-- The C compiler identification is Clang 21.1.1
__output__-- The ASM compiler identification is Clang with GNU-like command-line
__output__-- Found assembler: /tools/ATfE-21.1.1-Linux-AArch64/bin/clang
__output__-- Detecting C compiler ABI info
__output__-- Detecting C compiler ABI info - done
__output__-- Check for working C compiler: /tools/ATfE-21.1.1-Linux-AArch64/bin/clang - skipped
__output__-- Detecting C compile features
__output__-- Detecting C compile features - done
__output__-- Configuring done (0.3s)
__output__-- Generating done (0.0s)
__output__-- Build files have been written to: /work/build-baremetal
__output__
Then build all the examples with ninja:
ninja -C build-native/
__output__ninja: Entering directory `build-native/'
__output__[19/19] Linking C executable sme2_matmul_intr
ninja -C build-android/
__output__ninja: Entering directory `build-android/'
__output__[19/19] Linking C executable sme2_matmul_asm
docker run --rm -v "$PWD:/work" armswdev/sme2-learning-path:sme2-environment-v3 ninja -C build-baremetal/
__output__ninja: Entering directory `build-baremetal/'
__output__[1/19] Building ASM object CMakeFiles/sme2_matmul_asm.dir/preprocess_l_asm.S.obj
__output__[2/19] Building ASM object CMakeFiles/sme2_matmul_asm.dir/matmul_asm_impl.S.obj
__output__[3/19] Building C object CMakeFiles/hello.dir/hello.c.obj
__output__[4/19] Building C object CMakeFiles/sme2_matmul_asm.dir/matmul_vanilla.c.obj
__output__[5/19] Building C object CMakeFiles/sme2_matmul_asm.dir/preprocess_vanilla.c.obj
__output__[6/19] Building C object CMakeFiles/sme2_matmul_intr.dir/matmul_vanilla.c.obj
__output__[7/19] Building C object CMakeFiles/sme2_matmul_intr.dir/preprocess_vanilla.c.obj
__output__[8/19] Linking C executable hello
__output__[9/19] Building C object CMakeFiles/sme2_matmul_asm.dir/matmul_asm.c.obj
__output__[10/19] Building C object CMakeFiles/sme2_check.dir/sme2_check.c.obj
__output__[11/19] Building C object CMakeFiles/sme2_matmul_intr.dir/main.c.obj
__output__[12/19] Building C object CMakeFiles/sme2_matmul_asm.dir/main.c.obj
__output__[13/19] Building C object CMakeFiles/sme2_check.dir/misc.c.obj
__output__[14/19] Building C object CMakeFiles/sme2_matmul_asm.dir/misc.c.obj
__output__[15/19] Building C object CMakeFiles/sme2_matmul_intr.dir/misc.c.obj
__output__[16/19] Building C object CMakeFiles/sme2_matmul_intr.dir/matmul_intr.c.obj
__output__[17/19] Linking C executable sme2_check
__output__[18/19] Linking C executable sme2_matmul_asm
__output__[19/19] Linking C executable sme2_matmul_intr
The ninja command performs the following tasks:
hello, sme2_check, sme2_matmul_asm, and
sme2_matmul_intr.hello.lst,
sme2_check.lst, sme2_matmul_asm.lst, and sme2_matmul_intr.lst.At any point, you can clean the directory of all the files that have been built
by invoking ninja with the clean target:
ninja -C build-native/ clean
__output__ninja: Entering directory `build-native'
__output__[1/1] Cleaning all built files...
__output__Cleaning... 19 files.
ninja -C build-android/ clean
__output__ninja: Entering directory `build-android/'
__output__[1/1] Cleaning all built files...
__output__Cleaning... 19 files.
docker run --rm -v "$PWD:/work" armswdev/sme2-learning-path:sme2-environment-v3 ninja -C build-baremetal/ clean
__output__ninja: Entering directory `build-baremetal/'
__output__[1/1] Cleaning all built files...
__output__Cleaning... 19 files.
The very first program that you should run is the famous “Hello, world!” example that will tell you if your environment is set up correctly.
The source code is contained in hello.c and looks like this:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
printf("Hello, world !\n");
return EXIT_SUCCESS;
}
Run the hello program with:
./build-native/hello
__output__Hello, world !
adb push build-android/hello /data/local/tmp
__output__build-android/hello: 1 file pushed, 0 skipped. 14.6 MB/s (7544 bytes in 0.000s)
adb shell chmod 755 /data/local/tmp/hello
adb shell /data/local/tmp/hello
__output__Hello, world !
docker run --rm -v "$PWD:/work" armswdev/sme2-learning-path:sme2-environment-v3 ./run-fvp.sh build-baremetal/hello
__output__Hello, world !
__output__
__output__Info: /OSCI/SystemC: Simulation stopped by user.
In the emulated case, you will notice that the FVP prints out extra lines. The key confirmation is the presence of “Hello, world!” in the output: it demonstrates that the generic code can be compiled and executed.
You will now run the sme2_check program, which verifies that SME2 works as expected. This checks both the compiler and the CPU (or the emulated CPU) are properly supporting SME2.
The sme2_check program verifies that SME2 is available and working. It confirms:
The compiler supports SME2 (via __ARM_FEATURE_SME2)
The system or emulator reports SME2 capability
Streaming mode works as expected
The source code is found in sme2_check.c:
#include "misc.h"
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#ifdef __ARM_FEATURE_SME2
#include <arm_sme.h>
#else
#error __ARM_FEATURE_SME2 is not defined
#endif
__arm_locally_streaming void function_in_streaming_mode() {
printf("In streaming_mode: %d, SVL: %" PRIu64 " bits\n",
__arm_in_streaming_mode(), svcntb() * 8);
}
int main(int argc, char *argv[]) {
#if BAREMETAL == 1
setup_sme_baremetal();
#endif
if (!display_cpu_features()) {
printf("SME2 is not supported on this CPU.\n");
exit(EXIT_FAILURE);
}
printf("Checking initial in_streaming_mode: %d\n",
__arm_in_streaming_mode());
printf("Switching to streaming mode...\n");
function_in_streaming_mode();
printf("Switching back from streaming mode...\n");
printf("Checking in_streaming_mode: %d\n", __arm_in_streaming_mode());
return EXIT_SUCCESS;
}
The __ARM_FEATURE_SME2 macro (line 7) is provided by the compiler when it
targets an SME-capable target, which is specified with the +sme2
architectural feature in -march=armv9.4-a+sme2 (emulated environment) or
-march=native+sme2 command line option to clang in the CMakeLists.txt
(or in cmake/baremetal-toolchain.cmake for the emulated SME2 case).
The arm_sme.h file included at line 8 is part of the Arm C Library Extension
(
ACLE
). The ACLE provides types and
function declarations to enable C/C++ programmers to make the best possible use
of the Arm architecture. You can use the SME-related part of the library, but it
does also provide support for Neon or other Arm architectural extensions.
In order to run in a baremetal environment (like the one being used in the
emulated SME2 support), where no operating system has done the setup of the
processor for the user land programs, an additional step is required to turn
SME2 on. This is the purpose of the setup_sme_baremetal() call at line 21.
In environments where SME2 is natively supported, nothing needs to be done,
which is why the execution of this function is conditioned by the BAREMETAL
macro. BAREMETAL is set to 1 in the cmake/baremetal-toolchain.cmake when the FVP is targeted,
and set to 0 otherwise. The body of the setup_sme_baremetal function is
defined in misc.c.
The sme2_check program then displays whether SVE, SME and SME2 are supported
at line 24. The checking of SVE, SME and SME2 is done differently depending on
BAREMETAL. This platform specific behavior is abstracted by the
display_cpu_features():
ID_AA64PFR0_EL1 system register and the SME field of the ID_AA64PFR1_EL1 system register.The body of the display_cpu_features function is defined in misc.c.
If SME2 is not available, sme2_check will emit a diagnostic message (line
25) and exit (line 26).
sme2_check will then print the initial streaming mode state at line 29
(which is expected to be 0), then will switch to streaming mode (line 34) when
invoking function function_in_streaming_mode to show the Streaming Vector
Length (a.k.a SVL), and then switch back to non streaming mode (when
returning from function_in_streaming_mode). Function
function_in_streaming_mode is defined at line 13. Note that it has been
annotated with the __arm_locally_streaming attribute, which instructs the
compiler to automatically switch to streaming mode when invoking this function.
Streaming mode will be discussed in more depth in the next section.
Look for the following confirmation messages in the output:
./build-native/sme2_check
__output__HAS_SVE: 0
__output__HAS_SME: 1
__output__HAS_SME2: 1
__output__Checking initial in_streaming_mode: 0
__output__Switching to streaming mode...
__output__In streaming_mode: 1, SVL: 512 bits
__output__Switching back from streaming mode...
__output__Checking in_streaming_mode: 0
adb push build-android/sme2_check /data/local/tmp
__output__build-android/sme2_check: 1 file pushed, 0 skipped. 29.7 MB/s (19456 bytes in 0.001s)
adb shell chmod 755 /data/local/tmp/sme2_check
adb shell /data/local/tmp/sme2_check
__output__HAS_SVE: 1
__output__HAS_SME: 1
__output__HAS_SME2: 1
__output__Checking initial in_streaming_mode: 0
__output__Switching to streaming mode...
__output__In streaming_mode: 1, SVL: 512 bits
__output__Switching back from streaming mode...
__output__Checking in_streaming_mode: 0
docker run --rm -v "$PWD:/work" armswdev/sme2-learning-path:sme2-environment-v3 ./run-fvp.sh build-baremetal/sme2_check
__output__ID_AA64PFR0_EL1 : 0x1101101131111112
__output__ - SVE : 0x00000001
__output__ID_AA64PFR1_EL1 : 0x0000101002000001
__output__ - SME : 0x00000002
__output__Checking has_sme: 1
__output__Checking initial in_streaming_mode: 0
__output__Switching to streaming mode...
__output__In streaming_mode: 1, SVL: 512 bits
__output__Switching back from streaming mode...
__output__Checking in_streaming_mode: 0
__output__
__output__Info: /OSCI/SystemC: Simulation stopped by user.
You’ve now confirmed that your environment can compile and run SME2 code, and that SME2 features like streaming mode are working correctly. You’re ready to continue to the next section and start working with SME2 in practice.