In some cases, it is useful to look at the code generated by the compiler. In this Learning Path, the assembly listings have been produced and you can inspect them.
For example, the inner loop with the outer product and the accumulation of the matrix multiplication with intrinsics from the listing file sme2_matmul_intr.lst
looks like this:
...
80001854: a540a1c0 ld1w { z0.s }, p0/z, [x14]
80001858: a540a661 ld1w { z1.s }, p1/z, [x19]
8000185c: f10006b5 subs x21, x21, #0x1
80001860: 8b0d0273 add x19, x19, x13
80001864: 8b0a01ce add x14, x14, x10
80001868: 80812000 fmopa za0.s, p0/m, p1/m, z0.s, z1.s
8000186c: 54ffff41 b.ne 0x80001854 <matmul_intr+0x2e8>
...
Both of the main debuggers, gdb
and lldb
, have some support for debugging SME2 code. Their usage is not shown in this Learning Path though, the main
reason for this being that this Learning Path focuses on the CPU in baremetal mode.
This is a simplistic, and minimalistic environment, without an operating system, for example. Debug mode requires a debug monitor to interface between the debugger, the program, and the CPU.
The FVP can emit an instruction trace file in text format, known as the Tarmac trace. This provides a convenient way for you to understand what the program is doing.
In the excerpt shown below, you can see that the SVE register z0
has been loaded with 16 values, as predicate p0
was true, with an LD1W
instruction, whereas z1
was loaded with only two values, as p1
. z0
, and z1
are later used by the fmopa
instruction to compute the
outer product, and the trace displays the content of the ZA storage.
923530000 ps IT (92353) 80001b08 a540a1a0 O EL3h_s : LD1W {z0.S},p0/Z,[x13]
923530000 ps MR4 81000868:000081000868 40000000
923530000 ps MR4 8100086c:00008100086c 40800000
923530000 ps MR4 81000870:000081000870 40c00000
923530000 ps MR4 81000874:000081000874 41000000
923530000 ps MR4 81000878:000081000878 41200000
923530000 ps MR4 8100087c:00008100087c 41400000
923530000 ps MR4 81000880:000081000880 41600000
923530000 ps MR4 81000884:000081000884 41800000
923530000 ps MR4 81000888:000081000888 41900000
923530000 ps MR4 8100088c:00008100088c 41a00000
923530000 ps MR4 81000890:000081000890 41b00000
923530000 ps MR4 81000894:000081000894 41c00000
923530000 ps MR4 81000898:000081000898 41d00000
923530000 ps MR4 8100089c:00008100089c 41e00000
923530000 ps MR4 810008a0:0000810008a0 41f00000
923530000 ps MR4 810008a4:0000810008a4 42000000
923530000 ps R Z0 42000000_41f00000_41e00000_41d00000_41c00000_41b00000_41a00000_41900000_41800000_41600000_41400000_41200000_41000000_40c00000_40800000_40000000
923540000 ps IT (92354) 80001b0c a540a441 O EL3h_s : LD1W {z1.S},p1/Z,[x2]
923540000 ps MR4 81000780:000081000780 42027ae1
923540000 ps MR4 81000784:000081000784 c16b5c29
923540000 ps R Z1 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_c16b5c29_42027ae1
923550000 ps IT (92355) 80001b10 f1000484 O EL3h_s : SUBS x4,x4,#1
923550000 ps R cpsr 600003cd
923550000 ps R X4 0000000000000000
923560000 ps IT (92356) 80001b14 8b0a0042 O EL3h_s : ADD x2,x2,x10
923560000 ps R X2 0000000081000788
923570000 ps IT (92357) 80001b18 8b1701ad O EL3h_s : ADD x13,x13,x23
923570000 ps R X13 00000000810008A8
923580000 ps IT (92358) 80001b1c 80812000 O EL3h_s : FMOPA ZA0.S,p0/M,p1/M,z0.S,z1.S
923580000 ps R ZA0H_S_0 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_4190147b_42bd23d7
923580000 ps R ZA0H_S_1 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_42a6e668_435a7852
923580000 ps R ZA0H_S_2 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_4314e3d7_43ab2f5c
923580000 ps R ZA0H_S_3 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_4356547c_43e92290
923580000 ps R ZA0H_S_4 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_438be28f_44138ae2
923580000 ps R ZA0H_S_5 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_43ac9ae1_4432847b
923580000 ps R ZA0H_S_6 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_43cd5334_44517e15
923580000 ps R ZA0H_S_7 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_43ee0b86_447077ae
923580000 ps R ZA0H_S_8 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_440761eb_4487b8a4
923580000 ps R ZA0H_S_9 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_4417be14_44973571
923580000 ps R ZA0H_S_10 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_44281a3e_44a6b23e
923580000 ps R ZA0H_S_11 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_44387667_44b62f0a
923580000 ps R ZA0H_S_12 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_4448d28f_44c5abd7
923580000 ps R ZA0H_S_13 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_44592eb8_44d528a4
923580000 ps R ZA0H_S_14 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_44698ae1_44e4a571
923580000 ps R ZA0H_S_15 00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_00000000_4479e70a_44f4223e
You can get a Tarmac trace when invoking run-fvp.sh
by adding the --trace
option as the first argument, for example:
docker run --rm -v "$PWD:/work" -w /work armswdev/sme2-learning-path:sme2-environment-v1 ./run-fvp.sh --trace sme2_matmul_asm
Tracing is not enabled by default. It slows down the simulation significantly and the trace file can become very large for programs with large matrices.
It can be helpful when debugging to understand where an element in the
Tile is coming from. The current code base allows you to do that in debug
mode, when -DDEBUG
is passed to the compiler in the Makefile
. If you
look into main.c
, you will notice that the matrix initialization is no
longer random, but instead initializes each element with its linear
index. This makes it easier to find where the matrix elements are loaded in
the tile in tarmac trace, for example.