Port an example application that uses Intel Vector Statistics Library to Arm

Background

OpenRNG is an open-source Random Number Generator (RNG) library, initially released with Arm Performance Libraries 24.04. It is designed to simplify porting by serving as a drop-in replacement for Intel’s Vector Statistics Library (VSL).

OpenRNG supports various RNG types, including pseudorandom, quasirandom, and non-deterministic generators, and provides tools for efficient multithreading and converting random sequences into common probability distributions. A vector of random numbers is a sequence of numbers that appear random and are used in various applications, such as simulating unpredictable natural processes, modeling financial markets, and creating unpredictable AI behaviors in gaming.

Run on an x86 Instance

To demonstrate porting, start with an application running on an x86_64, AWS t3.2xlarge instance with 32GB of storage.

See Getting started with Servers and Cloud computing to create an x86 instance type.

Install the OneAPI toolkit using the Intel oneAPI Toolkits Installation Guide for Linux .

The following source code uses a classic algorithm to calculate pi.

Use a text editor to copy and paste the source code below into a file named pi_x86.c:

    

        
        
/*
 * SPDX-FileCopyrightText: <text>Copyright 2024 Arm Limited and/or its
 * affiliates <open-source-office@arm.com></text>
 *
 * SPDX-License-Identifier: MIT OR Apache-2.0 WITH LLVM-exception
 */

#include <mkl.h> // Using Vector Statistics Library
#include <stdio.h>
#include <stdlib.h>

void assert_message(int condition, const char *message) {
  if (!condition) {
    printf("Error: %s\n", message);
    exit(EXIT_FAILURE);
  }
}

int main() {

  const size_t nIterations = 1000 * 1000;
  const size_t nRandomNumbers = 2 * nIterations;

  //
  // Declare and initialise the stream.
  //
  // In this example, we've selected the PHILOX4X32X10 generator and seeded it
  // with 42. We can then check that the method executed successfully by checking
  // the return value for VSL_ERROR_OK. Most methods return VSL_ERROR_OK on
  // success.
  //
  VSLStreamStatePtr stream;
  int errcode = vslNewStream(&stream, VSL_BRNG_PHILOX4X32X10, 42);
  assert_message(errcode == VSL_ERROR_OK, "vslNewStream failed");

  //
  // Allocate a buffer for storing random numbers.
  //
  float *randomNumbers = malloc(nRandomNumbers * sizeof(float));
  assert_message(randomNumbers != NULL, "malloc failed");

  //
  // Generate a uniform distribution between 0 and 1.
  //
  // First, we select the method used to generate the uniform distribution; in
  // this example, we use the standard method. We pass in a pointer to an
  // initialised stream, the amount of random numbers we want, followed by a
  // pointer to a buffer big enough to hold all the random numbers requested.
  // Finally, we pass in parameters specific to the distribution, in this case,
  // 0 and 1, meaning we want the range [0, 1).
  //
  errcode = vsRngUniform(VSL_RNG_METHOD_UNIFORM_STD, stream, nRandomNumbers,
                         randomNumbers, 0, 1);
  assert_message(errcode == VSL_ERROR_OK, "vsRngUniform failed");

  //
  // Use the random numbers.
  //
  // This is a classic algorithm used for estimating the value of pi. You can imagine
  // a unit square overlapping a quarter of a circle with unit radius. You then
  // treat pairs of successive random numbers as points on the unit square. You
  // can check if the point is inside the quarter circle by measuring the
  // distance between the point and the center of the circle; if the distance is
  // less than 1, the point is inside the circle. The proportion of points
  // inside the circle should be
  //
  //  (area of quarter circle) / (area of square) := pi / 4.
  //
  // so
  //
  //  pi = 4 * (proportion of points inside circle)
  //
  int count = 0;
  for (size_t i = 0; i < nIterations; i++) {
    float x = randomNumbers[2 * i + 0];
    float y = randomNumbers[2 * i + 1];

    if (x * x + y * y < 1) {
      count++;
    }
  }
  float estimateOfPi = 4.0f * count / nIterations;

  printf("Estimate of pi:        %f\n", estimateOfPi);
  printf("Number of iterations:  %zu\n", nIterations);

  //
  // The buffer passed into vsRngUniform is still owned by the user.
  //
  free(randomNumbers);

  //
  // Release any resources held by the stream.
  //
  errcode = vslDeleteStream(&stream);
  assert_message(errcode == VSL_ERROR_OK, "vslDeleteStream failed");

  return EXIT_SUCCESS;
}

    

Compile the source code by running the following commands.

Note

You might need to adjust the oneapi version from 2025.0 to the version installed on your system.

    

        
        
export LD_LIBRARY_PATH=/opt/intel/oneapi/2025.0/lib:$LD_LIBRARY_PATH
gcc -o pi_x86 pi_x86.c -lmkl_rt -I/opt/intel/oneapi/2025.0/include -L/opt/intel/oneapi/2025.0/lib

    

Use the ldd command to print the shared objects and look at the path to libmkl.

    

        
        ldd ./pi_x86
        linux-vdso.so.1 (0x00007fff9ddc7000)
        libmkl_rt.so.2 => /opt/intel/oneapi/2025.0/lib/libmkl_rt.so.2 (0x0000748c46400000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000748c46000000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x0000748c4711a000)
        /lib64/ld-linux-x86-64.so.2 (0x0000748c4712c000)

        
    

Port the application to Arm

OpenRNG in most cases is a drop-in replacement for the Vector Statistics Library. See the reference guide for information on supported functions are supported. To enable this source code to run on Arm you need to modify the header file used.

Copy the file pi_x86.c to pi.c so you can make the modification:

    

        
        
// from 
#include <mkl.h> 
// to
#include "openrng.h"

    

Compile and run the modified example on the previously created Arm-based instance with the commands below.

Notice that -lamath is placed before -lm to ensure the optimized library is prioritized over the standard library implementation during compilation linking.

    

        
        
gcc -c -mcpu=native -I/opt/arm/armpl_24.10_gcc/include -std=c99 pi.c -o pi.o
gcc -mcpu=native pi.o -L/opt/arm/armpl_24.10_gcc/lib -larmpl -lamath -lm -o pi.exe
./pi.exe

    

The output from the program is:

    

        
        Running program openrng.exe:
Estimate of pi:        3.142112
Number of iterations:  1000000

        
    
Back
Next