You can use Process Watch to determine the presence or absence of certain instructions. In this section, you will use Process Watch to detect the use of NEON and SVE instructions by this example workload. Start by saving the simple workload shown below in a file name workload.c
:
#include <stdint.h>
#define LEN 1024
uint64_t a[LEN];
uint64_t b[LEN];
uint64_t c[LEN];
void doLoop() {
for (int i = 0; i < LEN; i++)
c[i] = a[i] + b[i];
}
void main() {
while (1)
doLoop();
}
Compile the workload without applying any optimizations:
aarch64-linux-gnu-gcc workload.c -o workload_none -O0
Now, run the workload in the background and launch processwatch
on the workload to detect the use of NEON and SVE instructions:
./workload_none &
[1] 126958
sudo ./processwatch -p 126958 -f HasNEON -f HasSVEorSME
You will need to change the PID in the processwatch
command with the PID of the workload running in the background.
The output should look like:
PID NAME NEON SVEorSME %TOTAL TOTAL
ALL ALL 0.00 0.00 100.00 24726
126958 workload_none 0.00 0.00 100.00 24726
PID NAME NEON SVEorSME %TOTAL TOTAL
ALL ALL 0.00 0.00 100.00 26006
126958 workload_none 0.00 0.00 100.00 26006
^C
You can see that in this case, the workload is not making use of NEON or SVE instructions.
Now recompile the same workload to make use of NEON instructions:
aarch64-linux-gnu-gcc workload.c -o workload_neon -O2 -ftree-vectorize -march=armv8.6-a
Run the workload in the background and launch processwatch
on the workload to detect the use of NEON and SVE instructions:
./workload_neon &
[1] 126987
sudo ./processwatch -p 126987 -f HasNEON -f HasSVEorSME
You will need to change the PID in the processwatch
command with the PID of the workload running in the background.
The output should look like:
PID NAME NEON SVEorSME %TOTAL TOTAL
ALL ALL 31.75 0.00 100.00 24828
126987 workload_neon 31.75 0.00 100.00 24828
PID NAME NEON SVEorSME %TOTAL TOTAL
ALL ALL 32.45 0.00 100.00 26143
126987 workload_neon 32.45 0.00 100.00 26143
^C
You can now see the workload is retiring NEON instructions as you would expect.
You can run objdump
on the binary to view the disassembled NEON instructions:
objdump -S workload_neon
The output should look like:
788: 4ee18400 add v0.2d, v0.2d, v1.2d
78c: 3ca06860 str q0, [x3, x0]
790: 91004000 add x0, x0, #0x10
794: f140081f cmp x0, #0x2, lsl #12
798: 54ffff41 b.ne 780 <doLoop+0x20> // b.any
Before you run this part, make sure the Arm machine you are running on has support for SVE.
To check which features are available on your platform, use:
cat /proc/cpuinfo
Look at the flags values and check for presence of sve.
Recompile the workload again. This time include support for SVE instructions:
aarch64-linux-gnu-gcc workload.c -o workload_sve -O2 -ftree-vectorize -march=armv8.5-a+sve
Run the workload in the background and launch processwatch
on the workload to detect the use of NEON and SVE instructions:
./workload_sve &
[1] 126997
sudo ./processwatch -p 126997 -f HasNEON -f HasSVEorSME
You will need to change the PID in the processwatch
command with the PID of the workload running in the background.
The output should look like:
PID NAME NEON SVEorSME %TOTAL TOTAL
ALL ALL 0.00 96.68 100.00 24914
126997 workload_sve 0.00 96.68 100.00 24914
PID NAME NEON SVEorSME %TOTAL TOTAL
ALL ALL 0.00 96.74 100.00 26137
126997 workload_sve 0.00 96.74 100.00 26137
^C
You can see the retired SVE instructions from running this workload.
Verify the SVE instructions by using objdump
on the binary:
objdump -S workload_sve
The output should look similar to:
7c4: 25e20fe0 whilelo p0.d, wzr, w2
7c8: a5e04080 ld1d {z0.d}, p0/z, [x4, x0, lsl #3]
7cc: a5e04061 ld1d {z1.d}, p0/z, [x3, x0, lsl #3]
7d0: 04e10000 add z0.d, z0.d, z1.d
7d4: e5e04020 st1d {z0.d}, p0, [x1, x0, lsl #3]
7d8: 8b050000 add x0, x0, x5
7dc: 25e20c00 whilelo p0.d, w0, w2
7e0: 54ffff41 b.ne 7c8 <doLoop+0x28> // b.any
As you can see, the Process Watch tool gives a good indication of what instructions or groups/features of the Arm architecture your workload is compiled for.