Set up and build the example application

Optimize memory access behavior using Arm Performix and the Arm MCP Server

Log an issue

Fork and edit

Discuss on Discord

Optimize memory access behavior using Arm Performix and the Arm MCP Server

Set up the build environment

In this section, you’ll install the required system packages, clone the orbiting galaxies example repository, and build the workload binaries. You can also run a visualization to confirm the simulation is working before you profile it.

Use your remote Arm server for all build and run steps. This example uses an Amazon EC2 c7g.metal instance running Ubuntu 24.04 LTS.

Install Arm Performix

Install and configure Arm Performix using the Performix install guide on both your local machine and the remote Arm server.

Install the required system packages

Run the following command, replacing apt with the package manager for your Linux distribution.

    

        
        
sudo apt update
sudo apt install -y git cmake build-essential python3 python3-venv python3-pip

Enable the Arm SPE PMU driver if not already loaded

To check whether the driver is already loaded, run:

    

        
        
lsmod | grep arm_spe_pmu

If the command returns output, the driver is loaded and you can skip this step. If it returns nothing, run the following commands to load it. This is required on Ubuntu 24.04 LTS in AWS, but may not be needed on other platforms.

    

        
        
sudo apt install -y linux-modules-extra-$(uname -r)
sudo modprobe arm_spe_pmu

If you’re using a c7g.metal instance, you also need to turn Kernel Page Table Isolation (KPTI) off.

The fastest way on AWS is to use an editor to add kpti=off to the GRUB_CMDLINE_LINUX_DEFAULT line in /etc/default/grub.d/50-cloudimg-settings.cfg.

After editing the file, run:

    

        
        
sudo update-grub
sudo reboot

For a complete explanation of SPE, see Enable Arm SPE for Performix memory access analysis .

Build the sample application

After setting up the build environment, clone and build the sample application.

Clone the example repository

Clone the orbiting galaxies repository and check out the tagged release to work from a known starting point:

    

        
        
git clone https://github.com/arm-education/Orbiting-Galaxy-Example.git
cd Orbiting-Galaxy-Example
git checkout -b my-work v1.0.3

Build with CMake

Build the project using CMake:

    

        
        
mkdir -p build
cd build
cmake ..
cmake --build . --parallel

This produces three binaries in build/:

baseline — the unoptimized reference binary used for profiling
users_solution — an editable copy of baseline for you to optimize manually
optimized — a pre-built reference solution showing the expected outcome

Set up a Python virtual environment and run visualization

After building the application, from the repository root, run:

    

        
        
cd ..
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r scripts/requirements.txt

Generate simulation frames and create the GIF:

    

        
        
cd build
./baseline --visualize
python3 ../scripts/visualize.py galaxy_baseline.bin

The script reads simulation data from galaxy_baseline.bin and writes a GIF file assets/galaxy_baseline.gif.

Image Alt Text:Animated orbiting galaxy simulation generated by the baseline workload, showing particle motion over time so you can verify that the simulation output looks correct before profiling. Orbiting galaxies workload visualization

Use --visualize only for understanding the workload behavior. Don’t include visualization mode in profiling runs because file I/O alters the measured runtime characteristics.

What you’ve accomplished and what’s next

You’ve now set up and built an orbiting galaxy application on an Arm-based instance by setting up a build environment and cloning the app from a GitHub repo. You’ve also run a visualization to confirm that the application works as expected.

Next, you’ll profile memory access behavior using Arm Performix.

Back

Optimize memory access behavior using Arm Performix and the Arm MCP Server

Introduction

Understand CPU memory hierarchy and address translation

Set up and build the example application

Profile memory access behavior with Arm Performix

Optimize the application manually and with the Arm MCP Server

Next Steps

Optimize memory access behavior using Arm Performix and the Arm MCP Server

Set up the build environment

Install Arm Performix

Install the required system packages

Enable the Arm SPE PMU driver if not already loaded

Build the sample application

Clone the example repository

Build with CMake

Set up a Python virtual environment and run visualization

What you’ve accomplished and what’s next