There are two ways to run SVE instructions if you don’t have SVE capable hardware: QEMU and the Arm Instruction Emulator (ArmIE). Each of these is covered below.

The steps shown are for an Arm v8-A system with Ubuntu 22.04 and no SVE support.

Example code

The example code adds two 127 double-precision arrays.

Use a text editor to copy the code below and save it in a file named sve_add.c


            #include <stdlib.h>
#include <stdio.h>

#ifndef SIZE
#define SIZE 127

void fun(double * restrict a, double * restrict b, int size)
  for (int i=0; i < size; ++i)
    b[i] += a[i];

int main() 
  int i;

  double *a=(double *)malloc(sizeof(double)*SIZE);
  double *b=(double *)malloc(sizeof(double)*SIZE);

  fun(a, b, SIZE);



Compile the applications using the commands shown:


  gcc -march=armv8-a+sve -O3 -fopt-info-vec sve_add.c -o sve_add.exe

  armclang --march=armv8-a+sve -O3 -Rpass=vector sve_add.c -o sve_add.exe


Run the application on the Arm Linux host:


Illegal instruction (core dumped)


An illegal instruction message confirms the host does not support SVE.


You can run applications containing SVE instructions without SVE capable hardware using QEMU , a generic and open source machine emulator and virtualizer.

Install qemu-user to run the example on processors which do not support SVE:


        sudo apt install qemu-user -y


Run the example application with a vector length of 256 bits:


        qemu-aarch64 -cpu max,sve-default-vector-length=256 ./sve_add.exe 


The application now runs and prints the expected message.

Arm Instruction Emulator

You can also run the application containing SVE instructions using the the Arm Instruction Emulator.

Download and install the Arm Instruction Emulator (see installation instructions ) on any Arm v8-A system. The Arm Instruction Emulator intercepts and emulates unsupported SVE instructions. It also support plugins for application analysis.


The Arm Instruction Emulator has been deprecated. It is still available for download, but there is no active development.

Arm Instruction Emulator Usage

Now run the application with ArmIE as shown:


        armie -msve-vector-bits=256 -- ./sve_add.exe


Armie requires the -msve-vector-bits parameter to specify the SVE vector length.


Armie has plugins you can use to analyze your application.

Count SVE instructions

The plugin reports the amount of executed instructions. Run the command below and check the output:


        armie -msve-vector-bits=256 -i  -- ./sve_add.exe
Client inscount is running
146163 instructions executed of which 193 were emulated instructions


Increasing the vector width from 256 to 512 divides the amount of emulated SVE instructions by two as shown:


        armie -msve-vector-bits=512 -i  -- ./sve_add.exe
Client inscount is running
146051 instructions executed of which 97 were emulated instructions


SVE instruction breakdown

To get more information on which instruction are executed, the plugin can be used as shown:


        armie -msve-vector-bits=512 -i -- ./sve_add.exe


Undecoded instructions are stored in a file with the format undecoded.APP.PID.log. To decode them, use the script, provided with Armie. This script requires llvm-mc and python 2.7. Install, using the command shown:


        sudo apt install llvm python


This example command processes the results:


        awk -F" : " '{print $2}' undecoded.sve_add.exe.175166.log | LLVM_MC=$(which llvm-mc) ./ | awk -F" : " '{print $2}' > decoded.log && paste undecoded.sve_add.exe.175166.log decoded.log


Which gives the following output:


               16 : 0xe5e14000  st1d { z0.d }, p0, [x0, x1, lsl #3]
       16 : 0xa5e14260  ld1d { z0.d }, p0/z, [x19, x1, lsl #3]
       16 : 0xa5e14001  ld1d { z1.d }, p0/z, [x0, x1, lsl #3]
       16 : 0x65c10000  fadd z0.d, z0.d, z1.d
       16 : 0x25e31c20  whilelo p0.d, x1, x3
       16 : 0x04f0e3e1  incd x1
        1 : 0x25e21fe0  whilelo p0.d, xzr, x2


In this list, see SVE instructions identified in the previous tutorial Compile for SVE . In the main loop, they are executed 16 times to compute the addition of 127 array elements (16 batches of 512-bit SVE instructions).

Trace SVE memory accesses on specific code sections

Specify region of interest (RoI)

The RoI allows to limit the amount of data generated by tracing. Add the following macros as shown in the code snippet below:


            #define __START_TRACE() { asm volatile (".inst 0x2520e020"); }
#define __STOP_TRACE() { asm volatile (".inst 0x2520e040"); }

void fun(double * restrict a, double * restrict b, int size)
  for (int i=0; i < size; ++i)
    b[i] += a[i];

Rebuild application and trace memory accesses

Rebuild the application and add the options -a -roi to Armie to filter data for the RoI:


        armie -e -i -a -roi -- ./sve_add.exe


Using and will generate two data files instrace.APP.PID.log and sve-memtrace.APP.PID.log. instrace.APP.PID.log traces all instructions executed. sve-memtrace.APP.PID.log only captures information about SVE memory accesses.

To filter data of interest, run the following commands:


        sed -i "/\(,0$\|;\)/d" instrace.sve_add.exe.235921.0000.log
sed -i '1d;$d' sve-memtrace.sve_add.exe.235921.log
awk -F"," '{print $2}' instrace.sve_add.exe.235921.0000.log | LLVM_MC=$(which llvm-mc) ./ | awk -F" : " '{print $2}' > mem.log && paste sve-memtrace.sve_add.exe.235921.log mem.log


The output will look like this:


        1, -1042929472, 0, 0, 64, 0xffff81d096a0, 0xffff81cf66d8        ld1d { z1.d }, p0/z, [x0, x1, lsl #3]
2, -1042929472, 0, 0, 64, 0xffff81d092a0, 0xffff81cf66dc        ld1d { z0.d }, p0/z, [x19, x1, lsl #3]
3, -1042929472, 0, 1, 64, 0xffff81d096a0, 0xffff81cf66e4        st1d { z0.d }, p0, [x0, x1, lsl #3]
43, -1042929472, 0, 0, 64, 0xffff81d09a20, 0xffff81cf66d8       ld1d { z1.d }, p0/z, [x0, x1, lsl #3]
44, -1042929472, 0, 0, 64, 0xffff81d09620, 0xffff81cf66dc       ld1d { z0.d }, p0/z, [x19, x1, lsl #3]
45, -1042929472, 0, 1, 64, 0xffff81d09a20, 0xffff81cf66e4       st1d { z0.d }, p0, [x0, x1, lsl #3]
46, -1042929472, 0, 0, 56, 0xffff81d09a60, 0xffff81cf66d8       ld1d { z1.d }, p0/z, [x0, x1, lsl #3]
47, -1042929472, 0, 0, 56, 0xffff81d09660, 0xffff81cf66dc       ld1d { z0.d }, p0/z, [x19, x1, lsl #3]
48, -1042929472, 0, 1, 56, 0xffff81d09a60, 0xffff81cf66e4       st1d { z0.d }, p0, [x0, x1, lsl #3]


You can identify 16 batches of 512-bit SVE load and stores. All of them are unpredicated and handle 64 bytes, except the last iteration which handles 56 bytes to compute elements indexes [120-126].