Introduction
Simplify Arm migration with the Docker MCP Toolkit and Arm MCP Server
Set up Docker MCP Toolkit with Arm, GitHub, and Sequential Thinking servers
Examine x86 AVX2 intrinsics in the demo application
Automate x86 to Arm migration with GitHub Copilot
Validate the Arm64 migration and test containers
Next Steps
The demo application is a matrix multiplication benchmark written in C++ using AVX2 intrinsics for vectorized performance on x86 processors. Clone the repository:
git clone https://github.com/JoeStech/docker-blog-arm-migration
cd docker-blog-arm-migration
This example is intentionally optimized for x86 so that you can see how architecture-specific code appears in practice and how it can be adapted for Arm.
Open the Dockerfile. There are two areas that require updates for Arm compatibility.
Add Arm64 support in the base image: The centos:6 image was published for x86 architecture and does not provide a linux/arm64 variant. To run on Arm hardware, the base image must support Arm64.
Modern multi-architecture base images typically publish both linux/amd64 and linux/arm64 manifests. Updating the base image is the first step toward portability.
Update compiler flags: The -mavx2 flag enables AVX2 vector instructions on x86. Arm processors use different SIMD instruction sets (NEON or SVE), so this flag must be removed or replaced when compiling for Arm.
Here is the full Dockerfile for reference:
FROM centos:6
RUN yum install -y \
devtoolset-2-gcc \
devtoolset-2-gcc-c++ \
devtoolset-2-binutils \
make \
&& yum clean all
WORKDIR /app
COPY *.h *.cpp ./
RUN scl enable devtoolset-2 "g++ -O2 -mavx2 -o benchmark \
main.cpp \
matrix_operations.cpp \
-std=c++11"
CMD ["./benchmark"]
Open matrix_operations.cpp. At the top of the file:
#include <immintrin.h> // x86-only header
The <immintrin.h> header provides Intel SIMD intrinsics, including AVX and AVX2. On Arm systems, SIMD intrinsics are provided through <arm_neon.h> instead.
Inside the matrix multiplication routine, you will see AVX2 intrinsics such as:
// Inside the multiply function:
__m256d sum_vec = _mm256_setzero_pd();
__m256d a_vec = _mm256_loadu_pd(&data[i][k]);
sum_vec = _mm256_add_pd(sum_vec, _mm256_mul_pd(a_vec, b_vec));
// Horizontal reduction
__m128d sum_high = _mm256_extractf128_pd(sum_vec, 1);
__m128d sum_low = _mm256_castpd256_pd128(sum_vec);
These mm256* functions map directly to 256-bit AVX2 instructions.
To run this code on Arm, several adjustments are required:
SIMD header replacement: x86 uses #include <immintrin.h>. Arm uses <arm_neon.h> instead.
Intrinsic mapping: Each AVX2 intrinsic must be mapped to an Arm equivalent. For example:
_mm256_setzero_pd() creates a 256-bit zero vector of four doubles. Arm NEON uses 128-bit registers._mm256_loadu_pd() loads 4 doubles at once (NEON loads 2 with vld1q_f64)._mm256_add_pd() and _mm256_mul_pd() are 256-bit operations (NEON uses 128-bit equivalents)._mm256_extractf128_pd() extracts the high 128 bits (not needed on NEON).Vector width differences: AVX2 operates on 256-bit registers (four double-precision values). NEON operates on 128-bit registers (two double-precision values). This affects:
Horizontal reduction logic: The AVX2 pattern:
_mm256_extractf128_pd(...)
_mm256_castpd256_pd128(...)
is specific to x86 register structure. On Arm, reduction is implemented using NEON reduction or pairwise-add instructions instead.
On newer Arm platforms supporting SVE or SVE2 (for example Neoverse V1/V2 based platforms), wider vector lengths may be available. SVE uses a vector-length-agnostic (VLA) model, which differs from fixed-width AVX2 and NEON programming. The Arm MCP Server knowledge base can help determine the appropriate approach for your target platform.
You have:
Next, you’ll use GitHub Copilot with the Docker MCP Toolkit to automate the migration process.