Run and evaluate

It is straight forward to compile and run the application, run the commands below:


            cmake -S src -B build
cd build/

A successful run will output execution time measurement results in microseconds in the terminal for the three implementations, and open four windows showing (as seen below) of the original, non-SIMD, SIMD and OpenCV images of a butterfly.

Image Alt Text: Sobel filter

In the results presented below, a value >1 is faster and a value <1 is slower in comparison to the normalized value. In short, higher values are better.

CompilerGCC 11.3.0

The results in the table above have been normalized to the QEMU Non-SIMD value, giving the relative speed-up.

You observe the following:

  • the non-SIMD implementation is the fastest but does not reflect the performance on an aarch64 system

Emulation does not give a representative view of how efficiently the algorithms run on Arm, it is only useful for functional purposes, not to measure performance

Closing notes

You have now ported an x86_64 application to aarch64, built and run the ported application on aarch64 using emulation, well done!

If you have access to Arm hardware, continue to the next section Evaluating real hardware . If you don’t have access to Arm hardware you can jump straight to the Review and test your knowledge.