Run the test

Once the build steps are complete, you can run the KleidiCV and OpenCV tests. The KleidiCV API test checks the public C++ API and confirms that the build is working as expected. To run the test, use the following command:

    

        
        
./build-kleidicv-benchmark-SME/test/api/kleidicv-api-test

    

You will see output showing the number of tests run and their results. The full test log is omitted here for clarity.

    

        
        
./build-kleidicv-benchmark-SME/test/api/kleidicv-api-test

    

The output is similar to:

    

        
        Vector length is set to 16 bytes.
Seed is set to 2542467924.
[==========] Running 3703 tests from 141 test suites.
[----------] Global test environment set-up.
[----------] 9 tests from SaturatingAddAbsWithThresholdTest/0, where TypeParam = short
[ RUN      ] SaturatingAddAbsWithThresholdTest/0.TestPositive
[       OK ] SaturatingAddAbsWithThresholdTest/0.TestPositive (0 ms)
[ RUN      ] SaturatingAddAbsWithThresholdTest/0.TestNegative
[       OK ] SaturatingAddAbsWithThresholdTest/0.TestNegative (0 ms)
[ RUN      ] SaturatingAddAbsWithThresholdTest/0.TestMin
[       OK ] SaturatingAddAbsWithThresholdTest/0.TestMin (0 ms)
[ RUN      ] SaturatingAddAbsWithThresholdTest/0.TestZero
[       OK ] SaturatingAddAbsWithThresholdTest/0.TestZero (0 ms)
[ RUN      ] SaturatingAddAbsWithThresholdTest/0.TestMax
[       OK ] SaturatingAddAbsWithThresholdTest/0.TestMax (0 ms)
[ RUN      ] SaturatingAddAbsWithThresholdTest/0.NullPointer
[       OK ] SaturatingAddAbsWithThresholdTest/0.NullPointer (0 ms)
[ RUN      ] SaturatingAddAbsWithThresholdTest/0.Misalignment
[       OK ] SaturatingAddAbsWithThresholdTest/0.Misalignment (0 ms)
[ RUN      ] SaturatingAddAbsWithThresholdTest/0.ZeroImageSize
[       OK ] SaturatingAddAbsWithThresholdTest/0.ZeroImageSize (0 ms)
[ RUN      ] SaturatingAddAbsWithThresholdTest/0.OversizeImage
[       OK ] SaturatingAddAbsWithThresholdTest/0.OversizeImage (0 ms)
[----------] 9 tests from SaturatingAddAbsWithThresholdTest/0 (0 ms total)

[----------] 4 tests from BitwiseAnd/0, where TypeParam = unsigned char
[ RUN      ] BitwiseAnd/0.API
[       OK ] BitwiseAnd/0.API (0 ms)
[ RUN      ] BitwiseAnd/0.Misalignment
[       OK ] BitwiseAnd/0.Misalignment (0 ms)
[ RUN      ] BitwiseAnd/0.ZeroImageSize
[       OK ] BitwiseAnd/0.ZeroImageSize (0 ms)
[ RUN      ] BitwiseAnd/0.OversizeImage
[       OK ] BitwiseAnd/0.OversizeImage (0 ms)
[----------] 4 tests from BitwiseAnd/0 (0 ms total)```

        
    
Note

Currently, Apple Xcode is built on Clang 17. Version clang-1700.3.19.1 has an SME-related code generation bug that causes float ResizeLinear API tests to fail.

Run the OpenCV test

After building OpenCV with KleidiCV, you will find the test binaries in the build-opencv-kleidicv-sme/bin/ directory. The main tool for benchmarking image processing performance is opencv_perf_imgproc. This utility measures both execution speed and throughput for the OpenCV imgproc module, including KleidiCV-accelerated operations.

To focus your testing, use the --gtest_filter option to select specific tests and --gtest_param_filter to set test parameters. For example, you can run the Gaussian blur 5×5 performance test three times on a 1920x1080 grayscale image with replicated borders:

  • Image size: 1920x1080 (Full HD)
  • Image type: 8UC1 (8-bit unsigned, single channel)
  • Border type: BORDER_REPLICATE

You can explore additional test cases and parameter combinations in the benchmarks.txt file in the KleidiCV repository.

The command for running the test is as follows:

    

        
        
./build-opencv-kleidicv-sme/bin/opencv_perf_imgproc
  --gtest_filter='*gaussianBlur5x5/*' \
  --gtest_param_filter='(1920x1080, 8UC1, BORDER_REPLICATE)' \
  --gtest_repeat=3

    

The expected output is:

    

        
        [ERROR:0@0.001] global persistence.cpp:566 open Can't open file: 'imgproc.xml' in read mode
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.12.0
OpenCV VCS version: 4.12.0-2-g2eea907534
Build type: Release
Compiler: /usr/bin/c++  (ver 17.0.0.17000013)
Algorithm hint: ALGO_HINT_ACCURATE
HAL: YES (carotene (ver 0.0.1) KleidiCV (ver 0.6.0))
Parallel framework: gcd (nthreads=12)
CPU features: NEON FP16 NEON_DOTPROD NEON_FP16 *NEON_BF16
OpenCL Platforms:
    Apple
        iGPU: Apple M4 Pro (OpenCL 1.2 )
Current OpenCL device:
    Type = iGPU
    Name = Apple M4 Pro
    Version = OpenCL 1.2
    Driver version = 1.2 1.0
    Address bits = 64
    Compute units = 16
    Max work group size = 256
    Local memory size = 32 KB
    Max memory allocation size = 3 GB
    Double support = No
    Half support = No
    Host unified memory = Yes
    Device extensions:
        cl_APPLE_SetMemObjectDestructor
        cl_APPLE_ContextLoggingFunctions
        cl_APPLE_clut
        cl_APPLE_query_kernel_names
        cl_APPLE_gl_sharing
        cl_khr_gl_event
        cl_khr_byte_addressable_store
        cl_khr_global_int32_base_atomics
        cl_khr_global_int32_extended_atomics
        cl_khr_local_int32_base_atomics
        cl_khr_local_int32_extended_atomics
        cl_khr_3d_image_writes
        cl_khr_image2d_from_buffer
        cl_khr_depth_images
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 1
    Preferred vector width short = 1
    Preferred vector width int = 1
    Preferred vector width long = 1
    Preferred vector width float = 1
    Preferred vector width double = 1
    Preferred vector width half = 0

Repeating all tests (iteration 1) . . .

Note: Google Test filter = *gaussianBlur5x5/*
Note: Google Test parameter filter = (1920x1080, 8UC1, BORDER_REPLICATE)
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Size_MatType_BorderType_gaussianBlur5x5
[ RUN      ] Size_MatType_BorderType_gaussianBlur5x5.gaussianBlur5x5/80, where GetParam() = (1920x1080, 8UC1, BORDER_REPLICATE)
[ PERFSTAT ]    (samples=100   mean=0.18   median=0.18   min=0.16   stddev=0.02 (12.7%))
[       OK ] Size_MatType_BorderType_gaussianBlur5x5.gaussianBlur5x5/80 (22 ms)
[----------] 1 test from Size_MatType_BorderType_gaussianBlur5x5 (22 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (22 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 2) . . .

Note: Google Test filter = *gaussianBlur5x5/*
Note: Google Test parameter filter = (1920x1080, 8UC1, BORDER_REPLICATE)
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Size_MatType_BorderType_gaussianBlur5x5
[ RUN      ] Size_MatType_BorderType_gaussianBlur5x5.gaussianBlur5x5/80, where GetParam() = (1920x1080, 8UC1, BORDER_REPLICATE)
[ PERFSTAT ]    (samples=100   mean=0.18   median=0.17   min=0.16   stddev=0.04 (23.7%))
[       OK ] Size_MatType_BorderType_gaussianBlur5x5.gaussianBlur5x5/80 (22 ms)
[----------] 1 test from Size_MatType_BorderType_gaussianBlur5x5 (22 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (22 ms total)
[  PASSED  ] 1 test.

Repeating all tests (iteration 3) . . .

Note: Google Test filter = *gaussianBlur5x5/*
Note: Google Test parameter filter = (1920x1080, 8UC1, BORDER_REPLICATE)
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Size_MatType_BorderType_gaussianBlur5x5
[ RUN      ] Size_MatType_BorderType_gaussianBlur5x5.gaussianBlur5x5/80, where GetParam() = (1920x1080, 8UC1, BORDER_REPLICATE)
[ PERFSTAT ]    (samples=100   mean=0.19   median=0.17   min=0.15   stddev=0.07 (36.1%))
[       OK ] Size_MatType_BorderType_gaussianBlur5x5.gaussianBlur5x5/80 (23 ms)
[----------] 1 test from Size_MatType_BorderType_gaussianBlur5x5 (23 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (23 ms total)
[  PASSED  ] 1 test.

        
    

Understand KleidiCV multiversion backend support

The KleidiCV library detects the platform hardware at runtime and selects the backend implementation based on the following priority:

  • SME2 backend implementation
  • SME backend implementation
  • SVE backend implementation
  • NEON backend implementation

The following code shows how the library resolves which implementation to use:

    

        
        
#define KLEIDICV_MULTIVERSION_C_API(api_name, neon_impl, sve2_impl, sme_impl, \
                                    sme2_impl)                                \
  static decltype(neon_impl) api_name##_resolver() {                          \
    [[maybe_unused]] KLEIDICV_TARGET_NAMESPACE::HwCaps hwcaps =               \
        KLEIDICV_TARGET_NAMESPACE::get_hwcaps();                              \
    KLEIDICV_SME2_RESOLVE(sme2_impl);                                         \
    KLEIDICV_SME_RESOLVE(sme_impl);                                           \
    KLEIDICV_SVE2_RESOLVE(sve2_impl);                                         \
    return neon_impl;                                                         \
  }                                                                           \
  extern "C" {                                                                \
  decltype(neon_impl) api_name = api_name##_resolver();                       \
  }

    

It verifies SME support using the query hw.optional.arm.FEAT_SME as follows:

    

        
        
#define KLEIDICV_SME_RESOLVE(sme_impl)                                       \
  if (!std::is_null_pointer_v<decltype(sme_impl)> &&                         \
      KLEIDICV_TARGET_NAMESPACE::query_sysctl("hw.optional.arm.FEAT_SME")) { \
    return sme_impl;                                                         \
  }

    

It verifies SME2 support using the query hw.optional.arm.FEAT_SME2 as follows:

    

        
        
#define KLEIDICV_SME2_RESOLVE(sme2_impl)                                      \
  if (!std::is_null_pointer_v<decltype(sme2_impl)> &&                         \
      KLEIDICV_TARGET_NAMESPACE::query_sysctl("hw.optional.arm.FEAT_SME2")) { \
    return sme2_impl;                                                         \
  }

    

Enable debug information for backend implementation at runtime

To incorporate dump information for multiversion backend support during runtime testing, update kleidicv/include/kleidicv/dispatch.h as outlined below:

To patch dispatch.h, copy the entire code below and paste it in your terminal. It will run the patch command insert the print statements to identify the backend.

    

        
        
patch -p1 -d "$HOME/kleidi" << 'EOF'
diff --git a/kleidicv/kleidicv/include/kleidicv/dispatch.h b/kleidicv/kleidicv/include/kleidicv/dispatch.h
index cc6ee01..44c98a5 100644
--- a/kleidicv/kleidicv/include/kleidicv/dispatch.h
+++ b/kleidicv/kleidicv/include/kleidicv/dispatch.h
@@ -1,10 +1,11 @@
-// SPDX-FileCopyrightText: 2023 - 2025 Arm Limited and/or its affiliates <open-source-office@arm.com>
+// SPDX-FileCopyrightText: 2024 - 2025 Arm Limited and/or its affiliates <open-source-office@arm.com>
 //
 // SPDX-License-Identifier: Apache-2.0

 #ifndef KLEIDICV_DISPATCH_H
 #define KLEIDICV_DISPATCH_H

+#include <stdio.h>
 #include "kleidicv/config.h"

 #if KLEIDICV_ENABLE_SME2 || KLEIDICV_ENABLE_SME || KLEIDICV_ENABLE_SVE2
@@ -33,6 +34,7 @@ static bool query_sysctl(const char* attribute_name) {
 #define KLEIDICV_SVE2_RESOLVE(sve2_impl)                                      \
   if (!std::is_null_pointer_v<decltype(sve2_impl)> &&                         \
       KLEIDICV_TARGET_NAMESPACE::query_sysctl("hw.optional.arm.FEAT_SVE2")) { \
+    printf("kleidicv API:: %s,SVE2 backend. \n", __func__);                    \
     return sve2_impl;                                                         \
   }
 #else
@@ -43,6 +45,7 @@ static bool query_sysctl(const char* attribute_name) {
 #define KLEIDICV_SME_RESOLVE(sme_impl)                                       \
   if (!std::is_null_pointer_v<decltype(sme_impl)> &&                         \
       KLEIDICV_TARGET_NAMESPACE::query_sysctl("hw.optional.arm.FEAT_SME")) { \
+    printf("kleidicv API:: %s,SME backend. \n", __func__);                  \
     return sme_impl;                                                         \
   }
 #else
@@ -53,6 +56,7 @@ static bool query_sysctl(const char* attribute_name) {
 #define KLEIDICV_SME2_RESOLVE(sme2_impl)                                      \
   if (!std::is_null_pointer_v<decltype(sme2_impl)> &&                         \
       KLEIDICV_TARGET_NAMESPACE::query_sysctl("hw.optional.arm.FEAT_SME2")) { \
+    printf("kleidicv API:: %s,SME2 backend. \n", __func__);                   \
     return sme2_impl;                                                         \
   }
 #else
@@ -67,6 +71,7 @@ static bool query_sysctl(const char* attribute_name) {
     KLEIDICV_SME2_RESOLVE(sme2_impl);                                         \
     KLEIDICV_SME_RESOLVE(sme_impl);                                           \
     KLEIDICV_SVE2_RESOLVE(sve2_impl);                                         \
+    printf("kleidicv API:: %s,NEON backend. \n", __func__);                   \
     return neon_impl;                                                         \
   }                                                                           \
   extern "C" {                                                                \
EOF

    

After making the change, rebuild the benchmark:

    

        
        
cmake --build build-kleidicv-benchmark-SME --parallel

    

Extract Neon or SME backend data on a MacBook

After making the change and rebuilding for testing, you can display the SME backend usage summary as follows:

    

        
        
./build-kleidicv-benchmark-SME/benchmark/kleidicv-benchmark

    

The output starts by printing the backends followed by the benchmark output:

    

        
        kleidicv API:: kleidicv_min_max_u8_resolver,SME backend.
kleidicv API:: kleidicv_min_max_s8_resolver,SME backend.
kleidicv API:: kleidicv_min_max_u16_resolver,SME backend.
kleidicv API:: kleidicv_min_max_s16_resolver,SME backend.
kleidicv API:: kleidicv_min_max_s32_resolver,SME backend.
kleidicv API:: kleidicv_min_max_f32_resolver,SME backend.
kleidicv API:: kleidicv_min_max_loc_u8_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_absdiff_u8_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_absdiff_s8_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_absdiff_u16_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_absdiff_s16_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_absdiff_s32_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_add_abs_with_threshold_s16_resolver,SME backend.
kleidicv API:: kleidicv_saturating_add_s8_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_add_u8_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_add_s16_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_add_u16_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_add_s32_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_add_u32_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_add_s64_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_add_u64_resolver,NEON backend.
kleidicv API:: kleidicv_compare_equal_u8_resolver,NEON backend.
kleidicv API:: kleidicv_compare_greater_u8_resolver,NEON backend.
kleidicv API:: kleidicv_exp_f32_resolver,SME backend.
kleidicv API:: kleidicv_in_range_u8_resolver,NEON backend.
kleidicv API:: kleidicv_in_range_f32_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_multiply_u8_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_multiply_s8_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_multiply_u16_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_multiply_s16_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_multiply_s32_resolver,NEON backend.
kleidicv API:: kleidicv_rotate_resolver,NEON backend.
kleidicv API:: kleidicv_scale_u8_resolver,NEON backend.
kleidicv API:: kleidicv_scale_f32_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_sub_s8_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_sub_u8_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_sub_s16_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_sub_u16_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_sub_s32_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_sub_u32_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_sub_s64_resolver,NEON backend.
kleidicv API:: kleidicv_saturating_sub_u64_resolver,NEON backend.
kleidicv API:: kleidicv_sum_f32_resolver,SME backend.
kleidicv API:: kleidicv_threshold_binary_u8_resolver,SME backend.
kleidicv API:: kleidicv_transpose_resolver,NEON backend.
kleidicv API:: kleidicv_f32_to_s8_resolver,SME backend.
kleidicv API:: kleidicv_f32_to_u8_resolver,SME backend.
kleidicv API:: kleidicv_s8_to_f32_resolver,SME backend.
kleidicv API:: kleidicv_u8_to_f32_resolver,SME backend.
kleidicv API:: kleidicv_gray_to_rgb_u8_resolver,SME backend.
kleidicv API:: kleidicv_gray_to_rgba_u8_resolver,SME backend.
kleidicv API:: kleidicv_merge_resolver,NEON backend.
kleidicv API:: kleidicv_rgb_to_bgr_u8_resolver,SME backend.
kleidicv API:: kleidicv_rgba_to_bgra_u8_resolver,SME backend.
kleidicv API:: kleidicv_rgb_to_bgra_u8_resolver,SME backend.
kleidicv API:: kleidicv_rgb_to_rgba_u8_resolver,SME backend.
kleidicv API:: kleidicv_rgba_to_bgr_u8_resolver,SME backend.
kleidicv API:: kleidicv_rgba_to_rgb_u8_resolver,SME backend.
kleidicv API:: kleidicv_rgb_to_yuv420_p_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_rgba_to_yuv420_p_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_bgr_to_yuv420_p_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_bgra_to_yuv420_p_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_rgb_to_yuv420_sp_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_rgba_to_yuv420_sp_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_bgr_to_yuv420_sp_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_bgra_to_yuv420_sp_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_rgb_to_yuv_u8_resolver,SME backend.
kleidicv API:: kleidicv_bgr_to_yuv_u8_resolver,SME backend.
kleidicv API:: kleidicv_rgba_to_yuv_u8_resolver,SME backend.
kleidicv API:: kleidicv_bgra_to_yuv_u8_resolver,SME backend.
kleidicv API:: kleidicv_split_resolver,NEON backend.
kleidicv API:: kleidicv_yuv_p_to_rgb_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_yuv_p_to_bgr_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_yuv_p_to_rgba_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_yuv_p_to_bgra_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_yuv_sp_to_rgb_u8_resolver,SME backend.
kleidicv API:: kleidicv_yuv_sp_to_bgr_u8_resolver,SME backend.
kleidicv API:: kleidicv_yuv_sp_to_rgba_u8_resolver,SME backend.
kleidicv API:: kleidicv_yuv_sp_to_bgra_u8_resolver,SME backend.
kleidicv API:: kleidicv_yuv_to_rgb_u8_resolver,SME backend.
kleidicv API:: kleidicv_yuv_to_bgr_u8_resolver,SME backend.
kleidicv API:: kleidicv_yuv_to_bgra_u8_resolver,SME backend.
kleidicv API:: kleidicv_yuv_to_rgba_u8_resolver,SME backend.
kleidicv API:: kleidicv_blur_and_downsample_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_gaussian_blur_fixed_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_gaussian_blur_arbitrary_stripe_u8_resolver,NEON backend.
kleidicv API:: kleidicv_median_blur_sorting_network_stripe_s8_resolver,SME backend.
kleidicv API:: kleidicv_median_blur_sorting_network_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_median_blur_sorting_network_stripe_u16_resolver,SME backend.
kleidicv API:: kleidicv_median_blur_sorting_network_stripe_s16_resolver,SME backend.
kleidicv API:: kleidicv_median_blur_sorting_network_stripe_u32_resolver,SME backend.
kleidicv API:: kleidicv_median_blur_sorting_network_stripe_s32_resolver,SME backend.
kleidicv API:: kleidicv_median_blur_sorting_network_stripe_f32_resolver,SME backend.
kleidicv API:: kleidicv_median_blur_small_hist_stripe_u8_resolver,NEON backend.
kleidicv API:: kleidicv_median_blur_large_hist_stripe_u8_resolver,NEON backend.
kleidicv API:: kleidicv_scharr_interleaved_stripe_s16_u8_resolver,SME backend.
kleidicv API:: kleidicv_separable_filter_2d_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_separable_filter_2d_stripe_u16_resolver,SME backend.
kleidicv API:: kleidicv_separable_filter_2d_stripe_s16_resolver,SME backend.
kleidicv API:: kleidicv_sobel_3x3_horizontal_stripe_s16_u8_resolver,SME backend.
kleidicv API:: kleidicv_sobel_3x3_vertical_stripe_s16_u8_resolver,SME backend.
kleidicv API:: kleidicv_bitwise_and_resolver,NEON backend.
kleidicv API:: kleidicv_dilate_u8_resolver,SME backend.
kleidicv API:: kleidicv_erode_u8_resolver,SME backend.
kleidicv API:: kleidicv_resize_to_quarter_u8_resolver,SME backend.
kleidicv API:: kleidicv_resize_linear_stripe_u8_resolver,SME backend.
kleidicv API:: kleidicv_resize_linear_stripe_f32_resolver,SME backend.
kleidicv API:: kleidicv_remap_s16_u8_resolver,NEON backend.
kleidicv API:: kleidicv_remap_s16_u16_resolver,NEON backend.
kleidicv API:: kleidicv_remap_s16point5_u8_resolver,NEON backend.
kleidicv API:: kleidicv_remap_s16point5_u16_resolver,NEON backend.
kleidicv API:: kleidicv_remap_f32_u8_resolver,NEON backend.
kleidicv API:: kleidicv_remap_f32_u16_resolver,NEON backend.
kleidicv API:: kleidicv_warp_perspective_stripe_u8_resolver,NEON backend.

        
    

The output is truncated for brevity, but you will see detailed performance metrics for each operation at 1280x720 resolution. Look for lines showing the operation name, sample count, mean and median times, and standard deviation. These results help you compare the performance of different backends and confirm that SME or NEON acceleration is active.

Use lldb to check the SME backend implementation

To perform source-level debugging during the build process, you must change the build type from Release to Debug, as demonstrated in the following example:

    

        
        
cmake -S $WORKSPACE/kleidicv \ 
      -B build-kleidicv-benchmark-SME \
      -DKLEIDICV_ENABLE_SME2=ON \ 
      -DKLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS=OFF \
      -DKLEIDICV_BENCHMARK=ON \
      -DCMAKE_BUILD_TYPE=Debug
cmake --build build-kleidicv-benchmark-SME --parallel

    

Use the lldb debug tool to set breakpoints during API testing and verify if the SME backend implementation is invoked. To view the function call backtrace, run the bt command as shown below:

    

        
        
lldb ./build-kleidicv-benchmark-SME/test/api/kleidicv-api-test

    

The interactions with the (lldb) command line are shown below. Start by entering the following commands in the lldb debugger:

    

        
        
target create "./build-kleidicv-benchmark-SME/test/api/kleidicv-api-test"
b saturating_add_abs_with_threshold
run

    

When the program stops at your breakpoint, enter:

    

        
        
bt

    

This command displays the stack trace, showing how the function was called.

Next, to view the assembly instructions (including SME streaming mode), enter:

    

        
        
disassemble --frame

    

After you finish inspecting the output, exit lldb by typing:

    

        
        
quit

    

Note: Your file paths may differ, but the sequence of commands remains the same. Enter each command as shown and review the output at each step.

    

        
        
target create "./build-kleidicv-benchmark-SME/test/api/kleidicv-api-test"
Current executable set to '$HOME/kleidi/opencv/build-kleidicv-benchmark-SME/test/api/kleidicv-api-test' (arm64).
(lldb) b saturating_add_abs_with_threshold
Breakpoint 1: 2 locations.
(lldb) run
Process 82381 launched: '/Users/Shared/workspace/build-kleidicv-benchmark-SME-debug/test/api/kleidicv-api-test' (arm64)
Vector length is set to 16 bytes.
Seed is set to 3168213869.
[==========] Running 3703 tests from 141 test suites.
[----------] Global test environment set-up.
[----------] 9 tests from SaturatingAddAbsWithThresholdTest/0, where TypeParam = short
[ RUN      ] SaturatingAddAbsWithThresholdTest/0.TestPositive
Process 82381 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
    frame #0: 0x0000000100695554 kleidicv-api-test`kleidicv_error_t kleidicv::sme::saturating_add_abs_with_threshold<short>(src_a=0x0000600002796762, src_a_stride=46, src_b=0x00006000027967f2, src_b_stride=46, dst=0x0000600002796912, dst_stride=46, width=23, height=3, threshold=50) at add_abs_with_threshold_sme.cpp:15:47
   12  	                                  const T *src_b, size_t src_b_stride, T *dst,
   13  	                                  size_t dst_stride, size_t width,
   14  	                                  size_t height, T threshold) {
-> 15  	  return saturating_add_abs_with_threshold_sc(src_a, src_a_stride, src_b,
   16  	                                              src_b_stride, dst, dst_stride,
   17  	                                              width, height, threshold);
   18  	}
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.2
  * frame #0: 0x0000000100695554 kleidicv-api-test`kleidicv_error_t kleidicv::sme::saturating_add_abs_with_threshold<short>(src_a=0x0000600002796762, src_a_stride=46, src_b=0x00006000027967f2, src_b_stride=46, dst=0x0000600002796912, dst_stride=46, width=23, height=3, threshold=50) at add_abs_with_threshold_sme.cpp:15:47
    frame #1: 0x0000000100009930 kleidicv-api-test`SaturatingAddAbsWithThresholdTestBase<short>::call_api(this=0x000000016fdfe670) at test_add_abs_with_threshold.cpp:17:12
    frame #2: 0x00000001000090c8 kleidicv-api-test`OperationTest<short, 2ul, 1ul>::test(this=0x000000016fdfe670) at operation.h:90:11
    frame #3: 0x0000000100008870 kleidicv-api-test`SaturatingAddAbsWithThresholdTest_TestPositive_Test<short>::TestBody(this=0x000060000179e270) at test_add_abs_with_threshold.cpp:135:58
    frame #4: 0x00000001008417cc kleidicv-api-test`void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(object=0x000060000179e270, method=0x00000000000000010000000000000020, location="the test body") at gtest.cc:2599:10
    frame #5: 0x0000000100810908 kleidicv-api-test`void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(object=0x000060000179e270, method=0x00000000000000010000000000000020, location="the test body") at gtest.cc:2635:14
    frame #6: 0x0000000100810858 kleidicv-api-test`testing::Test::Run(this=0x000060000179e270) at gtest.cc:2674:5
    frame #7: 0x000000010081163c kleidicv-api-test`testing::TestInfo::Run(this=0x000000011fe04290) at gtest.cc:2853:11
    frame #8: 0x00000001008126bc kleidicv-api-test`testing::TestSuite::Run(this=0x000000011fe049d0) at gtest.cc:3012:30
    frame #9: 0x000000010081fdec kleidicv-api-test`testing::internal::UnitTestImpl::RunAllTests(this=0x000000011fe04780) at gtest.cc:5870:44
    frame #10: 0x0000000100845750 kleidicv-api-test`bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(object=0x000000011fe04780, method=(kleidicv-api-test`testing::internal::UnitTestImpl::RunAllTests() at gtest.cc:5748), location="auxiliary test code (environments or event listeners)") at gtest.cc:2599:10
    frame #11: 0x000000010081f804 kleidicv-api-test`bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(object=0x000000011fe04780, method=(kleidicv-api-test`testing::internal::UnitTestImpl::RunAllTests() at gtest.cc:5748), location="auxiliary test code (environments or event listeners)") at gtest.cc:2635:14
    frame #12: 0x000000010081f6fc kleidicv-api-test`testing::UnitTest::Run(this=0x00000001009c92f0) at gtest.cc:5444:10
    frame #13: 0x00000001004e8600 kleidicv-api-test`RUN_ALL_TESTS() at gtest.h:2293:73
    frame #14: 0x00000001004e83a8 kleidicv-api-test`main(argc=1, argv=0x000000016fdff3b0) at test_main.cpp:82:10
    frame #15: 0x000000019f492b98 dyld`start + 6076
(lldb) disassemble --frame
kleidicv-api-test`kleidicv::sme::saturating_add_abs_with_threshold<short>:
    0x100695510 <+0>:   sub    sp, sp, #0xa0
    0x100695514 <+4>:   stp    d15, d14, [sp, #0x50]
    0x100695518 <+8>:   stp    d13, d12, [sp, #0x60]
    0x10069551c <+12>:  stp    d11, d10, [sp, #0x70]
    0x100695520 <+16>:  stp    d9, d8, [sp, #0x80]
    0x100695524 <+20>:  stp    x29, x30, [sp, #0x90]
    0x100695528 <+24>:  smstart sm
    0x10069552c <+28>:  ldrsh  w8, [sp, #0xa0]
    0x100695530 <+32>:  str    x0, [sp, #0x48]
    0x100695534 <+36>:  str    x1, [sp, #0x40]
    0x100695538 <+40>:  str    x2, [sp, #0x38]
    0x10069553c <+44>:  str    x3, [sp, #0x30]
    0x100695540 <+48>:  str    x4, [sp, #0x28]
    0x100695544 <+52>:  str    x5, [sp, #0x20]
    0x100695548 <+56>:  str    x6, [sp, #0x18]
    0x10069554c <+60>:  str    x7, [sp, #0x10]
    0x100695550 <+64>:  strh   w8, [sp, #0xe]
->  0x100695554 <+68>:  ldr    x0, [sp, #0x48]
    0x100695558 <+72>:  ldr    x1, [sp, #0x40]
    0x10069555c <+76>:  ldr    x2, [sp, #0x38]
    0x100695560 <+80>:  ldr    x3, [sp, #0x30]
    0x100695564 <+84>:  ldr    x4, [sp, #0x28]
    0x100695568 <+88>:  ldr    x5, [sp, #0x20]
    0x10069556c <+92>:  ldr    x6, [sp, #0x18]
    0x100695570 <+96>:  ldr    x7, [sp, #0x10]
    0x100695574 <+100>: ldrh   w8, [sp, #0xe]
    0x100695578 <+104>: mov    x9, sp
    0x10069557c <+108>: strh   w8, [x9]
    0x100695580 <+112>: bl     0x10087b8d0    ; symbol stub for: kleidicv_error_t kleidicv::sme::saturating_add_abs_with_threshold_sc<short>(short const*, unsigned long, short const*, unsigned long, short*, unsigned long, unsigned long, unsigned long, short)
    0x100695584 <+116>: smstop sm
    0x100695588 <+120>: ldp    x29, x30, [sp, #0x90]
    0x10069558c <+124>: ldp    d9, d8, [sp, #0x80]
    0x100695590 <+128>: ldp    d11, d10, [sp, #0x70]
    0x100695594 <+132>: ldp    d13, d12, [sp, #0x60]
    0x100695598 <+136>: ldp    d15, d14, [sp, #0x50]
    0x10069559c <+140>: add    sp, sp, #0xa0
    0x1006955a0 <+144>: ret
(lldb) quit

    

Summary

In this Learning Path, you tested the KleidiCV build and verified its functionality. You ran both the KleidiCV API tests and the OpenCV performance tests. You also explored how KleidiCV’s multiversion support works, enabling it to select the optimal backend like SME, SVE, or NEON at runtime. Finally, you learned how to enable debug output and use the lldb debugger to confirm that the SME backend is being used and to inspect the assembly code.

Back
Next