Have a look at the following C example that uses Intel SSE4.2 intrinsics.
On an x86_64 Linux development machine, create a file named calculation_sse.c
, populating it with the contents as shown below:
#include <xmmintrin.h>
#include <stdio.h>
int main() {
__m128 a = _mm_set_ps(16.0f, 9.0f, 4.0f, 1.0f);
__m128 b = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f);
__m128 cmp_result = _mm_cmpgt_ps(a, b);
float a_arr[4], b_arr[4], cmp_arr[4];
_mm_storeu_ps(a_arr, a);
_mm_storeu_ps(b_arr, b);
_mm_storeu_ps(cmp_arr, cmp_result);
for (int i = 0; i < 4; i++) {
if (cmp_arr[i] != 0.0f) {
printf("Element %d: %.2f is larger than %.2f\n", i, a_arr[i], b_arr[i]);
} else {
printf("Element %d: %.2f is not larger than %.2f\n", i, a_arr[i], b_arr[i]);
}
}
__m128 add_result = _mm_add_ps(a, b);
__m128 mul_result = _mm_mul_ps(add_result, b);
__m128 sqrt_result = _mm_sqrt_ps(mul_result);
float res[4];
_mm_storeu_ps(res, add_result);
printf("Addition Result: %f %f %f %f\n", res[0], res[1], res[2], res[3]);
_mm_storeu_ps(res, mul_result);
printf("Multiplication Result: %f %f %f %f\n", res[0], res[1], res[2], res[3]);
_mm_storeu_ps(res, sqrt_result);
printf("Square Root Result: %f %f %f %f\n", res[0], res[1], res[2], res[3]);
return 0;
}
The program does the following:
Compile the code on your Linux x86_64 system that supports SSE4.2:
gcc -O3 calculation_sse.c -o calculation_sse -msse4.2
Now run the program:
./calculation_sse
The output should look like the following:
Element 0: 1.00 is not larger than 1.00
Element 1: 4.00 is larger than 2.00
Element 2: 9.00 is larger than 3.00
Element 3: 16.00 is larger than 4.00
Addition Result: 2.00 6.00 12.00 20.00
Multiplication Result: 2.00 12.00 36.00 80.00
Square Root Result: 1.41 3.46 6.00 8.94
It is imperative that you run the code first on an Intel x86_64 reference platform, to make sure that you understand how it works and what kind of results you can expect.