Deploying the Compiler: Unlocking C++ Performance on Arm

The C++ language gives you the freedom to be expressive in your code, enabling low-level manipulation of memory and data structures. However, this flexibility comes at a cost - compared to managed languages, such as Java, C++ source code is generally less portable and requires recompilation for the target Arm architecture.

Since the compiler plays a critical role in the process, deploying it correctly is essential. When optimizing C++ workloads on Arm, you can achieve significant performance gains simply by using the compiler efficiently - without modifying the source code.

Writing performant C++ code is a complex topic. This Learning Path focuses on how to effectively use the compiler to target the Arm architecture for Linux applications.

What is the g++ compiler, and what is its purpose?

The g++ compiler is part of the GNU Compiler Collection (GCC), a set of compilers for various programming languages, including C++.

The primary objective of the g++ compiler is to translate C++ source code into machine code that a computer can execute. This process involves several high-level stages:

  • Preprocessing: in this initial stage, the preprocessor handles directives that start with a # symbol, such as #include, #define, and #if. It expands included header files, replaces macros, and processes conditional compilation statements.

  • Compilation: the compiler translates the preprocessed source code into an intermediate representation specific to the target processor architecture. This step includes syntax checking, semantic analysis, and generating error messages for any issues.

  • Assembly: the intermediate representation is converted into assembly language, using mnemonics and syntax specific to the target processor architecture. Assemblers then convert this assembly code into object code (machine code).

  • Linking: the final stage involves linking the object code with necessary libraries and other object files. The linker merges multiple object files and libraries, resolves external references, allocates memory addresses for functions and variables, and generates an executable file that can be run on the target platform.

An interesting fact about the GNU compiler is that it is designed to optimize both the performance and the size of the generated code. It performs various optimizations based on the knowledge it has of the program, and it can be configured to prioritize reducing the size of the executable.

Other Compilers for C++

Two popular compilers for C++ are the GNU Compiler Collection (GCC) and LLVM - both open-source compilers that include contributions from Arm engineers to support the latest architectures. Proprietary or vendor-specific compilers, such as nvcc for compiling code for NVIDIA GPUs, are often based on these open-source compilers.

Alternatively, some proprietary compilers are often designed for specific use cases. For example, safety-critical applications might need to comply with various ISO standards, that govern both the application and the compiler. An example is the functional safety Arm Compiler for Embedded , a C/C++ compiler designed for such environments.

If you are an application developer not working in the functional safety domain, you can use the open-source GCC compiler.

Multiple Linux distributions are available, each with a default compiler.

Print the version information for your compiler:

    

        
        
g++ --version

    

For example, after installing g++ on Ubuntu 24.04, the default compiler as of January 2025, output appears as follows:

    

        
        g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

        
    

The table below shows the default (*) and available compilers for a variety of Linux distributions. This is taken from the AWS-Graviton Performance Runbook .

DistributionGCCClang/LLVM
Amazon Linux 202311*15*
Amazon Linux 27*, 107, 11*
Ubuntu 24.049, 10, 11, 12, 13*, 1414, 15, 16, 17, 18*
Ubuntu 22.049, 10, 11*, 1211, 12, 13, 14*
Ubuntu 20.047, 8, 9*, 106, 7, 8, 9, 10, 11, 12
Ubuntu 18.044.8, 5, 6, 7*, 83.9, 4, 5, 6, 7, 8, 9, 10
Debian107, 8*6, 7, 8
Red Hat EL88*, 9, 1010
SUSE Linux ES157*, 9, 107

The easiest way to achieve a performance gain is by using the most recent compiler available, as it includes the latest optimizations and support.

For example, according to the g++ documentation, the most recent version of GCC, version 14.2, lists the following support and optimizations in the release notes :

    

        
        A number of new CPUs are supported through the -mcpu and -mtune options (GCC identifiers in parentheses).
    - Ampere-1B (ampere1b).
    - Arm Cortex-A520 (cortex-a520).
    - Arm Cortex-A720 (cortex-a720).
    - Arm Cortex-X4 (cortex-x4).
    - Microsoft Cobalt-100 (cobalt-100).

        
    

Exercise due diligence when updating your C++ compiler, as the process might reveal bugs in your source code. These bugs are often the result of undefined behavior caused by not adhering to the C++ standard. It is rare for the compiler itself to introduce a bug. Typically, if such bugs occur, they are publicly documented.

Basic g++ Optimization Levels

Using the g++ compiler as an example, the most coarse-grained setting you can adjust is the optimization level, denoted with -O<x>. This adjusts a variety of lower-level optimization flags, trading off computation time, memory use, and debuggability. When aggressive optimization is used, the optimized binary might not show expected behavior when hooked up to a debugger such as gdb. This is because the generated code might not match the original source code or program order, for example from loop unrolling and vectorization.

Below are some common optimization levels:

Optimization LevelDescription
-O0No optimization; useful for debugging.
-O1Basic optimizations that improve performance without significantly increasing compilation time.
-O2More aggressive optimizations that further enhance performance.
-O3Enables aggressive optimizations that can significantly improve execution speed.
-OsOptimizes code size, reducing the overall binary size.
-OfastEnables optimizations that may not strictly adhere to standard compliance.

For full details on the optimization level, see your compiler documentation, such as the g++ Options That Control Optimization .

Back
Next