Reading time: | 10 min |
Last updated: | 6 Sep 2024 |
Reading time: |
10 min |
Last updated: |
6 Sep 2024 |
This guide is intended to get you up and running with this tool quickly with the most common settings. For a thorough review of all options, refer to the official documentation.
The Streamline CLI tools are native command-line tools that are designed to run directly on an Arm server running Linux. The tools provide a software profiling methodology that gives you clear and actionable performance data. You can use this data to guide the optimization of the heavily used functions in your software.
Streamline CLI tools are supported with the following host operating systems running on an Arm AArch64 host machine:
Streamline CLI tools are supported on the following Arm CPUs:
Use the Arm Sysreport utility to determine whether your system configuration supports hardware-assisted profiling. Follow the instructions in Get ready for performance analysis with Sysreport to discover how to download and run this utility.
The perf counters
entry in the generated report indicates how many CPU counters are available. The perf sampling
entry indicates if SPE is available. You will achieve the best profiles in systems with at least 6 available CPU counters and SPE.
The Streamline CLI tools can be used in systems without any CPU counters, but can only return a basic hot-spot profile based on time-based sampling. No top-down methodology metrics will be available.
The Streamline CLI tools can give top-down metrics in systems with as few as 3 available CPU counters. The effective sample rate for each metric will be lower, because you need to time-slice the counters to capture all of the requested metrics. This means that you need to run your application for longer to get the same number of samples for each metric. Metrics that require more input counters than are available cannot be captured.
The Streamline CLI tools can be used without SPE. Load operation data source metrics will not be available, and branch mispredict metrics might be less accurate.
Before you can capture a software profile you must build your application with debug information. This enables the profiler to map instruction addresses back to specific functions in your source code. For C and C++ you do this by passing the -g
option to the compiler.
Arm recommends that you profile an optimized release build of your application, as this ensures you are profiling a realistic code workload. For C and C++ you do this by passing the -O2
or -O3
option to the compiler. However, it is recommended that you disable invasive optimization techniques, such as link-time optimization (LTO), because they heavily restructure the code and make the profile difficult to understand.
If you are using the workflow_topdown_basic option
, ensure that your application workload is at least 20 seconds long, in order to give the core time to capture all of the metrics needed. This time increases linearly as you add more metrics to capture.
Download and extract the Streamline CLI tools on your Arm server:
wget https://artifacts.tools.arm.com/arm-performance-studio/2024.3/Arm_Streamline_CLI_Tools_9.2.2_linux_arm64.tgz
tar -xzf Arm_Streamline_CLI_Tools_9.2.2_linux_arm64.tgz
The sl-format.py
Python script requires Python 3.8 or later, and depends on several third-party modules. We recommend creating a Python virtual environment containing these modules to run the tools. For example:
# From Bash
python3 -m venv sl-venv
source ./sl-venv/bin/activate
# From inside the virtual environment
python3 -m pip install -r ./streamline_cli_tools/bin/requirements.txt
The instructions in this guide assume you have added the <install>/bin/
directory to your PATH
environment variable, and that you run all Python commands from inside the virtual environment.
For best results, we provide a Linux kernel patch that modifies the behavior of Linux perf to improve support for capturing function-attributed top-down metrics on Arm systems. This patch provides two new capabilities:
Without the patch it is possible to capture profiles. However, not all capture options are available and capturing top-down metrics will rely on high frequency sampling. The following options are available:
With the patch applied, it is possible to collect the following profiles:
The following instructions show you how to install the patch on Amazon Linux 2023. You might need to adapt them slightly to other Linux distributions.
To apply the patch to the latest 6.7 kernel, you can use git
:
git apply v6.7-combined.patch
or patch
:
patch -p 1 -i v6.7-combined.patch
Follow these steps to integrate these patches into an RPM-based distribution’s kernel:
Install the RPM build tools:
sudo yum install rpm-build rpmdevtools
Remove any existing rpmbuild
directory, renaming as appropriate:
rm -fr rpmbuild
Fetch the kernel sources:
yum download --source kernel
Install the sources binary:
rpm -i kernel-<VERSION>.src.rpm
Enter the rpmbuild
directory that is created:
cd rpmbuild
Copy the patch into the correct location. Replace the 9999 patch number with the next available patch number in the sequence:
cp vX.Y-combined.patch SOURCES/9999-strobing-patch.patch
Open the specs file in your preferred editor:
nano SPECS/kernel.spec
Search for the list of patches starting with Patch0001
, and append the line for the new patch to the end of the list. Replace 9999 with the patch number used earlier:
Patch9999: 9999-strobing-patch.patch
Search for the list of patch apply steps starting with ApplyPatch
, and append the line for the new patch to the end of the list. Replace 9999 with the patch number used earlier:
ApplyPatch 9999-strobing-patch.patch
Save the changes and exit the editor.
Install the build dependencies:
sudo dnf builddep SPECS/kernel.spec
Build the kernel and other rpms:
rpmbuild -ba SPECS/kernel.spec
Install the built packages:
sudo rpm -ivh --force RPMS/aarch64/*.rpm
Reboot the system:
sudo reboot
Validate that the patch has been applied correctly:
ls -l /sys/bus/event_source/devices/*/format/strobe_period
This should list at least one CPU PMU device supporting the strobing features, for example:
/sys/bus/event_source/devices/armv8_pmuv3_0/format/strobe_period
You are now ready to use Streamline CLI Tools. Refer to Profiling for Neoverse with Streamline CLI Tools to get started.
How would you rate the overall quality of this tool quick-install guide?
What is the primary reason for your feedback ?
Thank you. We're grateful for your feedback on how to improve this tool quick-install guide.