About this Install Guide

This guide is intended to get you up and running with this tool quickly with the most common settings. For a thorough review of all options, refer to the official documentation.

The Streamline CLI tools are native command-line tools that are designed to run directly on an Arm server running Linux. The tools provide a software profiling methodology that gives you clear and actionable performance data. You can use this data to guide the optimization of the heavily used functions in your software.

Before you begin

Use the Arm Sysreport utility to determine whether your system configuration supports hardware-assisted profiling. Follow the instructions in Get ready for performance analysis with Sysreport to discover how to download and run this utility.

The perf counters entry in the generated report indicates how many CPU counters are available. The perf sampling entry indicates if SPE is available. You will achieve the best profiles in systems with at least 6 available CPU counters and SPE.

The Streamline CLI tools can be used in systems without any CPU counters, but can only return a basic hot-spot profile based on time-based sampling. No top-down methodology metrics will be available.

The Streamline CLI tools can give top-down metrics in systems with as few as 3 available CPU counters. The effective sample rate for each metric will be lower, because you need to time-slice the counters to capture all of the requested metrics. This means that you need to run your application for longer to get the same number of samples for each metric. Metrics that require more input counters than are available cannot be captured.

The Streamline CLI tools can be used without SPE. Load operation data source metrics will not be available, and branch mispredict metrics might be less accurate.

Building your application

Your application should be a release build, but needs to include symbol information. Build your application with the -g option to include symbol information. Arm recommends that you disable link-time-optimization to make the profile easier to understand.

If you are using the workflow_topdown_basic option, ensure that your application workload is at least 20 seconds long, in order to give the core time to capture all of the metrics needed. This time increases linearly as you add more metrics to capture.

Install Streamline CLI Tools

  1. Download and extract the Streamline CLI tools on your Arm server:

        
    
            
            
                wget https://artifacts.tools.arm.com/arm-performance-studio/2024.2/Arm_Streamline_CLI_Tools_9.2.0_linux_arm64.tgz 
    tar -xzf Arm_Streamline_CLI_Tools_9.2.0_linux_arm64.tgz 
            
        
    
  2. The sl-format.py Python script requires Python 3.8 or later, and depends on several third-party modules. We recommend creating a Python virtual environment containing these modules to run the tools. For example:

        
    
            
            
                # From Bash
    python3 -m venv sl-venv
    source ./sl-venv/bin/activate
    
    # From inside the virtual environment
    python3 -m pip install -r ./streamline_cli_tools/bin/requirements.txt
            
        
    
    Note

    The instructions in this guide assume you have added the <install>/bin/ directory to your PATH environment variable, and that you run all Python commands from inside the virtual environment.

Applying the kernel patch

For best results, we provide a Linux kernel patch that modifies the behavior of Linux perf to improve support for capturing function-attributed top-down metrics on Arm systems. This patch provides two new capabilities:

  • It allows a new thread to inherit the perf counter group configuration of its parent.
  • It decouples the perf event-based sampling window size from the overall sample rate. This allows strobed mark-space sampling patterns where the tool can capture a small window without using a high sample rate.

Without the patch it is possible to capture profiles. However, not all capture options are available and capturing top-down metrics will rely on high frequency sampling. The following options are available:

  • System-wide profile with top-down metrics.
  • Single threaded application profile with top-down metrics.
  • Multi-process/thread application profile without top-down metrics.

With the patch applied, it is possible to collect the following profiles:

  • System-wide profile with top-down metrics.
  • Single threaded application profile with top-down metrics.
  • Multi-process/thread application profile with top-down metrics.

The following instructions show you how to install the patch on Amazon Linux 2023. You might need to adapt them slightly to other Linux distributions.

Manual application to the source tree

To apply the patch to the latest 6.7 kernel, you can use git:

    

        
        
            git apply v6.7-combined.patch
        
    

or patch:

    

        
        
            patch -p 1 -i v6.7-combined.patch
        
    

Manual application to an RPM-based distribution

Follow these steps to integrate these patches into an RPM-based distribution’s kernel:

  1. Remove any existing rpmbuild directory, renaming as appropriate:

        
    
            
            
                rm -fr rpmbuild
            
        
    
  2. Fetch the kernel sources:

        
    
            
            
                yum download --source kernel
            
        
    
  3. Install the sources binary:

        
    
            
            
                rpm -i kernel-<VERSION>.src.rpm
            
        
    
  4. Enter the rpmbuild directory that is created:

        
    
            
            
                cd rpmbuild
            
        
    
  5. Copy the patch into the correct location. Replace the 9999 patch number with the next available patch number in the sequence:

        
    
            
            
                cp vX.Y-combined.patch SOURCES/9999-strobing-patch.patch
            
        
    
  6. Open the specs file in your preferred editor:

        
    
            
            
                nano SPECS/kernel.spec
            
        
    
  7. Search for the list of patches starting with Patch0001, and append the line for the new patch to the end of the list. Replace 9999 with the patch number used earlier:

        
    
            
            
                Patch9999: 9999-strobing-patch.patch
            
        
    
  8. Search for the list of patch apply steps starting with ApplyPatch, and append the line for the new patch to the end of the list. Replace 9999 with the patch number used earlier:

        
    
            
            
                ApplyPatch 9999-strobing-patch.patch
            
        
    
  9. Save the changes and exit the editor.

  10. Build the kernel and other rpms:

        
    
            
            
                rpmbuild -ba SPECS/kernel.spec
            
        
    
  11. Install the built packages:

        
    
            
            
                sudo rpm -ivh --force RPMS/aarch64/*.rpm
            
        
    
  12. Reboot the system:

        
    
            
            
                sudo reboot
            
        
    
  13. Validate that the patch has been applied correctly:

        
    
            
            
                ls -l /sys/bus/event_source/devices/*/format/strobe_period
            
        
    

    This should list at least one CPU PMU device supporting the strobing features, for example:

        
    
            
            /sys/bus/event_source/devices/armv8_pmuv3_0/format/strobe_period
    
            
        
    

You are now ready to use Streamline CLI Tools. Refer to Profiling for Neoverse with Streamline CLI Tools to get started.


Feedback

How would you rate the overall quality of this tool quick-install guide?