In this section you will learn alternative ways to build the development environment and compile with a different compiler, on Arm hardware.
The changes you made in main.cpp
and CMakeLists.txt
in
Application porting
stay the same when running on actual Arm hardware.
The following Arm hardware has been selected due to their high availability.
AWS EC2 instances with Graviton processors use the aarch64
architecture. Graviton2 and Graviton3 have different vector engine technologies.
For more information on Graviton, refer to Getting Started with AWS and the AWS Graviton Technical Guide .
Create a Graviton instance and connect using ssh
with the -X
option to allow for display forwarding.
Replace INSERT_KEY_PEM_FILE
and INSERT_GRAVITON_INSTANCE_IP_ADDRESS
with your SSH key and public IP address:
ssh -X -i "INSERT_KEY_PEM_FILE" ubuntu@INSERT_GRAVITON_INSTANCE_IP_ADDRESS
Install Docker Engine on the EC2 instance.
The Raspberry Pi is setup just like a normal desktop computer and the following is assumed:
Refer to Get started with the Raspberry Pi 4 for more information about using the Raspberry Pi 4 for software development, including how to install Docker engine.
Use the same Dockerfile as before, see Development environment .
You can build the development environment natively on Arm using docker build
instead of docker buildx
.
To build the GCC development environment, run the following command:
docker build -t sobel_gcc_example .
Run the container:
docker run --rm -ti --net=host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix/ -v $HOME/.Xauthority:/home/ubuntu/.Xauthority sobel_gcc_example
Follow the same steps to port the application as described in Application porting to build and run the application.
In addition to the GCC development environment, you can also use Arm Compiler for Linux ( ACfL ).
Pull the development container from the armswdev
repository and rename it sobel_acfl_example
:
docker pull armswdev/arm-compiler-for-linux
docker tag armswdev/arm-compiler-for-linux sobel_acfl_example
Run the container:
docker run --rm -ti --net=host -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix/ -v $HOME/.Xauthority:/home/ubuntu/.Xauthority sobel_acfl_example
The container doesn’t have OpenCV.
Install OpenCV by running the command:
sudo apt-get update && sudo apt-get install -y libopencv-dev
Follow the same steps to port the application as described in Application porting .
To use Arm Compiler for Linux you need to change the compiler in CMakeLists.txt
as shown below:
set(CMAKE_C_COMPILER "/opt/arm/arm-linux-compiler-23.04_Generic-AArch64_Ubuntu-22.04_aarch64-linux/bin/armclang")
set(CMAKE_CXX_COMPILER "/opt/arm/arm-linux-compiler-23.04_Generic-AArch64_Ubuntu-22.04_aarch64-linux/bin/armclang++")
Make the changes by running the following command:
sed -i "6i set(CMAKE_C_COMPILER\ \"/opt/arm/arm-linux-compiler-23.04_Ubuntu-22.04/bin/armclang\")" src/CMakeLists.txt
sed -i "7i set(CMAKE_CXX_COMPILER\ \"/opt/arm/arm-linux-compiler-23.04_Ubuntu-22.04/bin/armclang++\")\n" src/CMakeLists.txt
Compile and run the application:
cmake -S src -B build
cd build/
make
./sobel_simd_opencv
The output is the same as when running using QEMU. A noticeable difference compared to QEMU is that the SIMD implementation runs faster, which is expected. QEMU should not be used for performance measurement purposes.
In the results presented below, a value >1 is faster and a value <1 is slower in comparison to the normalized value. In short, higher values are better.
Graviton2 | Graviton3 | |||||
---|---|---|---|---|---|---|
Compiler | GCC 12.2.0 | ACfL 22.1 | GCC 12.2.0 | ACfL 22.1 | ||
Non-SIMD | 1.0 | 1.0 | 1.7 | 1.8 | ||
SIMD | 3.4 | 3.8 | 5.8 | 6.7 | ||
OpenCV | 0.3 | 0.3 | 0.4 | 0.5 |
The results in the table above have been normalized to the Graviton2 Non-SIMD value, giving the relative speed-up.
You observe the following:
Raspberry Pi 4 | ||
---|---|---|
Compiler | GCC 11.3.0 | ACfL 22.1 |
Non-SIMD | 1.0 | 0.9 |
SIMD | 2.7 | 3.0 |
OpenCV | 0.3 | 0.3 |
The results in the table above have been normalized to the Raspberry Pi 4 Non-SIMD value, giving the relative speed-up.
You observe the following: