The C++ language gives you the freedom to be expressive in your code, enabling low-level manipulation of memory and data structures. However, this flexibility comes at a cost - compared to managed languages, such as Java, C++ source code is generally less portable and requires recompilation for the target Arm architecture.
Since the compiler plays a critical role in the process, deploying it correctly is essential. When optimizing C++ workloads on Arm, you can achieve significant performance gains simply by using the compiler efficiently - without modifying the source code.
Writing performant C++ code is a complex topic. This Learning Path focuses on how to effectively use the compiler to target the Arm architecture for Linux applications.
The g++ compiler is part of the GNU Compiler Collection (GCC), a set of compilers for various programming languages, including C++.
The primary objective of the g++ compiler is to translate C++ source code into machine code that a computer can execute. This process involves several high-level stages:
Preprocessing: in this initial stage, the preprocessor handles directives that start with a #
symbol, such as #include
, #define
, and #if
. It expands included header files, replaces macros, and processes conditional compilation statements.
Compilation: the compiler translates the preprocessed source code into an intermediate representation specific to the target processor architecture. This step includes syntax checking, semantic analysis, and generating error messages for any issues.
Assembly: the intermediate representation is converted into assembly language, using mnemonics and syntax specific to the target processor architecture. Assemblers then convert this assembly code into object code (machine code).
Linking: the final stage involves linking the object code with necessary libraries and other object files. The linker merges multiple object files and libraries, resolves external references, allocates memory addresses for functions and variables, and generates an executable file that can be run on the target platform.
An interesting fact about the GNU compiler is that it is designed to optimize both the performance and the size of the generated code. It performs various optimizations based on the knowledge it has of the program, and it can be configured to prioritize reducing the size of the executable.
Two popular compilers for C++ are the GNU Compiler Collection (GCC) and LLVM - both open-source compilers that include contributions from Arm engineers to support the latest architectures. Proprietary or vendor-specific compilers, such as nvcc
for compiling code for NVIDIA GPUs, are often based on these open-source compilers.
Alternatively, some proprietary compilers are often designed for specific use cases. For example, safety-critical applications might need to comply with various ISO standards, that govern both the application and the compiler. An example is the functional safety Arm Compiler for Embedded , a C/C++ compiler designed for such environments.
If you are an application developer not working in the functional safety domain, you can use the open-source GCC compiler.
Multiple Linux distributions are available, each with a default compiler.
Print the version information for your compiler:
g++ --version
For example, after installing g++
on Ubuntu 24.04, the default compiler as of January 2025, output appears as follows:
g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The table below shows the default (*) and available compilers for a variety of Linux distributions. This is taken from the AWS-Graviton Performance Runbook .
Distribution | GCC | Clang/LLVM |
---|---|---|
Amazon Linux 2023 | 11* | 15* |
Amazon Linux 2 | 7*, 10 | 7, 11* |
Ubuntu 24.04 | 9, 10, 11, 12, 13*, 14 | 14, 15, 16, 17, 18* |
Ubuntu 22.04 | 9, 10, 11*, 12 | 11, 12, 13, 14* |
Ubuntu 20.04 | 7, 8, 9*, 10 | 6, 7, 8, 9, 10, 11, 12 |
Ubuntu 18.04 | 4.8, 5, 6, 7*, 8 | 3.9, 4, 5, 6, 7, 8, 9, 10 |
Debian10 | 7, 8* | 6, 7, 8 |
Red Hat EL8 | 8*, 9, 10 | 10 |
SUSE Linux ES15 | 7*, 9, 10 | 7 |
The easiest way to achieve a performance gain is by using the most recent compiler available, as it includes the latest optimizations and support.
For example, according to the g++ documentation, the most recent version of GCC, version 14.2, lists the following support and optimizations in the release notes :
A number of new CPUs are supported through the -mcpu and -mtune options (GCC identifiers in parentheses).
- Ampere-1B (ampere1b).
- Arm Cortex-A520 (cortex-a520).
- Arm Cortex-A720 (cortex-a720).
- Arm Cortex-X4 (cortex-x4).
- Microsoft Cobalt-100 (cobalt-100).
Exercise due diligence when updating your C++ compiler, as the process might reveal bugs in your source code. These bugs are often the result of undefined behavior caused by not adhering to the C++ standard. It is rare for the compiler itself to introduce a bug. Typically, if such bugs occur, they are publicly documented.
Using the g++ compiler as an example, the most coarse-grained setting you can adjust is the optimization level, denoted with -O<x>
. This adjusts a variety of lower-level optimization flags, trading off computation time, memory use, and debuggability. When aggressive optimization is used, the optimized binary might not show expected behavior when hooked up to a debugger such as gdb
. This is because the generated code might not match the original source code or program order, for example from loop unrolling and vectorization.
Below are some common optimization levels:
Optimization Level | Description |
---|---|
-O0 | No optimization; useful for debugging. |
-O1 | Basic optimizations that improve performance without significantly increasing compilation time. |
-O2 | More aggressive optimizations that further enhance performance. |
-O3 | Enables aggressive optimizations that can significantly improve execution speed. |
-Os | Optimizes code size, reducing the overall binary size. |
-Ofast | Enables optimizations that may not strictly adhere to standard compliance. |
For full details on the optimization level, see your compiler documentation, such as the g++ Options That Control Optimization .