Learn about optimization techniques using the g++ compiler: Find specific Neoverse features

Learn about optimization techniques using the g++ compiler

Log an issue

Fork and edit

Discuss on Discord

Learn about optimization techniques using the g++ compiler

Versions of Arm-based instances on Neoverse

Most cloud service providers offer a variety of Arm-based instances based on Neoverse.

For example, Graviton3 was announced in 2021, and instance types include M7g, C7g, and R7g. Graviton3 offers up to 2x better floating-point performance, up to 2x faster crypto performance, and up to 3x better ML performance compared to Graviton2.

Graviton3 uses Arm Neoverse V1 cores.

In 2023, AWS announced Graviton4, based on Neoverse V2 cores. Graviton4 increases the core count to 96 and was first available in the R8g instance type. It provides 30% better compute performance, 50% more cores, and 75% more memory bandwidth than Graviton3.

Using the same AWS instance that you created in the previous section, run the lscpu command and observe the underlying Neoverse architecture in the Model Name row.

For example, on the r8g.xlarge instance, run:

    

        
        
lscpu | grep -i model

The output is:

    

        
        Model name:                           Neoverse-V2
Model:                                1

You can confirm that the AWS r8g.xlarge instance is based on the Neoverse V2 processor. Use this instance for the remainder of the Learning Path.

Identify supported CPU features

To identify Arm architecture features at runtime in a C program, you can use the Linux hardware capabilities (HWCAP) vector. The source code below reads a specific vector that contains this information.

Use a text editor to copy and paste the C program below into a file named hw_cap.c:

    

        
        
#include <stdio.h>
#include <sys/auxv.h>
#include <asm/hwcap.h>

int main()
{
    long hwcaps = getauxval(AT_HWCAP);

    if (hwcaps & HWCAP_AES) {
        printf("AES instructions are available\n");
    } else {
        printf("AES instructions are not available\n");
    }
    if (hwcaps & HWCAP_CRC32) {
        printf("CRC32 instructions are available\n");
    } else {
        printf("CRC32 instructions are not available\n");
    }
    if (hwcaps & HWCAP_PMULL) {
        printf("PMULL/PMULL2 instructions that operate on 64-bit data are available\n");
    } else {
        printf("PMULL/PMULL2 instructions are not available\n");
    }
    if (hwcaps & HWCAP_SHA1) {
        printf("SHA1 instructions are available\n");
    } else {
        printf("SHA1 instructions are not available\n");
    }
    if (hwcaps & HWCAP_SHA2) {
        printf("SHA2 instructions are available\n");
    } else {
        printf("SHA2 instructions are not available\n");
    }
    if (hwcaps & HWCAP_SVE) {
        printf("Scalable Vector Extension (SVE) instructions are available\n");
    } else {
        printf("Scalable Vector Extension (SVE) instructions are not available\n");
    }

    return 0;
}

Compile and run the program with the commands:

    

        
        
gcc hw_cap.c -o hw_cap
./hw_cap

The output below confirms that Scalable Vector Extensions (SVE) are available:

    

        
        AES instructions are available
CRC32 instructions are available
PMULL/PMULL2 instructions that operate on 64-bit data are available
SHA1 instructions are available
SHA2 instructions are available
Scalable Vector Extension (SVE) instructions are available

For the latest list of all hardware capabilities available for a specific Linux kernel version, see the arch/arm/include/uapi/asm/hwcap.h header file in the Linux Kernel source code.

Additionally, knowing the SVE vector width is useful for optimizing software performance.

Use a text editor to copy and paste the following C code into a file named sve_width.c:

    

        
        
#include <arm_sve.h>
#include <stdio.h>

int main() {
    int sve_width = svcntb();
    printf("SVE vector length: %d bytes\n", sve_width);
    return 0;
}

Compile and run the program with the following commands:

    

        
        
g++ sve_width.c -o sve_width -mcpu=neoverse-v2
./sve_width

The output shows that the Neoverse V2-based Graviton4 instance has a SVE width of 16 bytes (128 bits).

    

        
        SVE vector length: 16 bytes

Supported Compiler Features

Fortunately, the g++ compiler automatically identifies the host system’s capability. You can use the -### argument to show the full options used when compiling.

You can observe which processors are potential targets for compiling your code using the following g++ command:

    

        
        
g++ -E -mcpu=help -xc /dev/null

The output is:

    

        
        cc1: note: valid arguments are: cortex-a34 cortex-a35 cortex-a53 cortex-a57 cortex-a72 cortex-a73 thunderx thunderxt88p1 thunderxt88 octeontx octeontx81 octeontx83 thunderxt81 thunderxt83 ampere1 ampere1a emag xgene1 falkor qdf24xx exynos-m1 phecda thunderx2t99p1 vulcan thunderx2t99 cortex-a55 cortex-a75 cortex-a76 cortex-a76ae cortex-a77 cortex-a78 cortex-a78ae cortex-a78c cortex-a65 cortex-a65ae cortex-x1 cortex-x1c **neoverse-n1** ares neoverse-e1 octeontx2 octeontx2t98 octeontx2t96 octeontx2t93 octeontx2f95 octeontx2f95n octeontx2f95mm a64fx tsv110 thunderx3t110 neoverse-v1 zeus neoverse-512tvb saphira cortex-a57.cortex-a53 cortex-a72.cortex-a53 cortex-a73.cortex-a35 cortex-a73.cortex-a53 cortex-a75.cortex-a55 cortex-a76.cortex-a55 cortex-r82 cortex-a510 cortex-a710 cortex-a715 cortex-x2 cortex-x3 neoverse-n2 cobalt-100 neoverse-v2 grace demeter generic

Comparing the same command using g++9 shows that there are fewer CPU targets available for optimization because recent CPUs, such as the Neoverse V2, are not yet included.

    

        
        
g++-9 -E -mcpu=help -xc /dev/null

The output from version 9 is:

    

        
        cc1: note: valid arguments are: cortex-a35 cortex-a53 cortex-a57 cortex-a72 cortex-a73 thunderx thunderxt88p1 thunderxt88 octeontx octeontx81 octeontx83 thunderxt81 thunderxt83 emag xgene1 falkor qdf24xx exynos-m1 phecda thunderx2t99p1 vulcan thunderx2t99 cortex-a55 cortex-a75 cortex-a76 ares neoverse-n1 neoverse-e1 a64fx tsv110 zeus neoverse-v1 neoverse-512tvb saphira neoverse-n2 cortex-a57.cortex-a53 cortex-a72.cortex-a53 cortex-a73.cortex-a35 cortex-a73.cortex-a53 cortex-a75.cortex-a55 cortex-a76.cortex-a55 generic

Back

Learn about optimization techniques using the g++ compiler

Introduction

Compiler basics

Set up Your Environment

Find specific Neoverse features

Try an example application

Next Steps

Learn about optimization techniques using the g++ compiler

Versions of Arm-based instances on Neoverse

Identify supported CPU features

Supported Compiler Features