Libamath

The libamath library from Arm is an optimized subset of the standard library math functions for Arm-based CPUs, providing both scalar and vector functions at different levels of precision. It includes vectorized versions (NEON and SVE) of common math functions found in the standard library, such as those in the <cmath> header.

The trivial snippet below uses the <cmath> standard cmath header to calculate the base exponential of a scalar value.

Use a text editor to copy and paste the example code below into a file named basic_math.cpp:

    

        
        
#include <iostream>
#include <ctime>
#include <cmath>  // Include the standard library

int main() {
    std::srand(std::time(0));
    double random_number = std::rand() / static_cast<double>(RAND_MAX);
    double result = exp(random_number); // Use the standard exponential function
    std::cout << "Exponential of " << random_number << " is " << result << std::endl;
    return 0;
}

    

Compile the code using the g++ command. You can use the ldd command to print the shared objects for dynamic linking:

    

        
        
g++ basic_math.cpp -o basic_math
ldd basic_math

    

Notice that the superset libm.so is linked. The output is:

    

        
                linux-vdso.so.1 (0x0000f55218587000)
        libstdc++.so.6 => /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000f55218200000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000f55218490000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000f55218050000)
        /lib/ld-linux-aarch64.so.1 (0x0000f5521854e000)
        libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000f55218460000)

        
    

Update to use Libamath

Using the optimized math library libamath requires only minimal source code changes for the scalar example. Simply modify the include statements to reference the correct header file and add the necessary compiler flags.

Libamath routines have maximum errors below 4 ULPs, where ULP stands for Unit in the Last Place, which is the smallest difference between two consecutive floating-point numbers at any given precision. These routines support only the default rounding mode, which is Round-to-Nearest, Ties to Even. As a result, switching from libm to libamath might cause a slight loss in accuracy on a range of routines, similar to other vectorized implementations.

Use a text editor to copy and paste the following C++ code into a file named optimized_math.cpp:

    

        
        
#include <iostream>
#include <ctime>
#include <amath.h> // Include the Arm Performance Library header

int main() {
    std::srand(std::time(0));
    double random_number = std::rand() / static_cast<double>(RAND_MAX);
    double result = exp(random_number); // Use the optimized exp function from libamath
    std::cout << "Exponential of " << random_number << " is " << result << std::endl;
    return 0;
}

    

Compile the code using the following g++ command. Again, use the ldd command to print the shared objects for dynamic linking.

    

        
        
g++ optimized_math.cpp -o optimized_math -lamath -lm
ldd optimized_math

    

Now you can see that the libamath.so shared object is linked:

    

        
                linux-vdso.so.1 (0x0000eb1eb379b000)
        libamath.so => /opt/arm/armpl_24.10_gcc/lib/libamath.so (0x0000eb1eb35c0000)
        libstdc++.so.6 => /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000eb1eb3200000)
        libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000eb1eb3050000)
        libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000eb1eb3520000)
        /lib/ld-linux-aarch64.so.1 (0x0000eb1eb3762000)
        libgcc_s.so.1 => /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000eb1eb34f0000)

        
    

What about Vector Operations?

In the Arm Performance Library, scalar operations follow the same naming convention of libm. This means that you can simply update the header file and recompile your code with minimal changes.

For vector operations, you have two options:

  1. You can rely on the compiler autovectorization, where the compiler automatically generates the vectorized code. The Arm Compiler for Linux (ACfL) uses autovectorization.

  2. You can use vector routines, which uses name mangling. Name mangling is a technique used in computer programming to modify the names of vector functions to ensure uniqueness and avoid conflicts. This is particularly important in compiled languages like C++ and in environments where multiple libraries or modules might be used together.

In the context of Arm’s AArch64 architecture, vector name mangling follows the specific convention below to differentiate between scalar and vector versions of functions:

    

        
        '_ZGV' <isa> <mask> <vlen> <signature> '_' <original_name>

        
    

Where the values are given below:

  • original_name - the name of scalar libm function.
  • isa - ’n’ for Neon, ’s’ for SVE.
  • mask - ‘M’ for masked/predicated version, ‘N’ for unmasked. Only masked routines are defined for SVE, and only unmasked for NEON.
  • vlen - the integer number representing vector length expressed as number of lanes. For NEON, =‘2’ in double-precision and =‘4’ in single-precision. For SVE, =‘x’.
  • signature - ‘v’ for 1 input floating point or integer argument, ‘vv’ for 2. For further information, see AArch64’s vector function ABI.
Back
Next