Example 1: Sampling of CPython calculating Googolplex

Note

All the steps in these following sections are done on a native ARM64 Windows on Arm machine.

You will use the pre-built CPython binaries targeting ARM64 from sources in the debug mode from the previous step and then complete the following:

  • Pin python_d.exe interactive console to an arbitrary CPU core.
  • Calculate a large integer number Googolplex to stress the CPython application and get a simple workload.
  • Run counting and sampling to obtain some simple event information.

Pin the new CPython process to a CPU core 1

Use the Windows start command to execute and pin python_d.exe (CPython interactive console) to CPU core number 1.

    

        
        
            start /affinity 2 python_d.exe
        
    
Note

The start command line switch /affinity <hexaffinity> applies the specified processor affinity mask (expressed as a hexadecimal number) to the new application. In our example decimal 2 is 0x02 or 0b0010. This value denotes core no. 1 as 1 is a first bit in the mask, where the mask is indexed from 0 (zero).

This command will bring up CPython in interactive mode:

    

        
        Python 3.12.0a6+ (heads/main:1ff81c0cb6, Mar 14 2023, 16:26:50) [MSC v.1935 64 bit (ARM64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

        
    

You can use the Windows Task Manager to confirm that python_d.exe is running on CPU core 1. The newly created CPython interactive window will allow us to execute example workloads.

In the example below, you will calculate a very large integer 10^10^100.

Executing computation intensive calculations with CPython

In the CPython interactive window, type in the Googolplex number 1010100 and press enter.

    

        
        
            10**10**100
        
    

Sampling the CPython application running the Googolplex calculation on CPU core 1

You can now sample the Arm PMU event ld_spec which corresponds to the speculatively executed load operation. Please note that you can specify the process image name and PDB file name with --pdb_file python_d.pdb and --image_name python_d.exe. In our case wperf is able to deduce the image name (which is the same as the PE file name) and the PDB file from the PR file name.

You can stop sampling by pressing Ctrl-C in the wperf console.

    

        
        
            wperf sample -e ld_spec:100000 --pe_file python_d.exe -c 1
        
    

Please wait a few seconds for the samples to arrive from the Kernel driver and then press Ctrl+C to stop sampling. You should see:

    

        
        base address of 'python_d.exe': 0x7ff6e0a41270, runtime delta: 0x7ff5a0a40000
sampling ....e.e.e.e.e.eCtrl-C received, quit counting... done!
======================== sample source: ld_spec, top 50 hot functions ========================
 75.39%       579  x_mul:python312_d.dll
  6.51%        50  v_isub:python312_d.dll
  5.60%        43  _Py_atomic_load_32bit_impl:python312_d.dll
  3.12%        24  v_iadd:python312_d.dll
  2.60%        20  PyErr_CheckSignals:python312_d.dll
  2.08%        16  unknown
  1.17%         9  x_add:python312_d.dll
  0.91%         7  _Py_atomic_load_64bit_impl:python312_d.dll
  0.52%         4  _Py_ThreadCanHandleSignals:python312_d.dll
  0.52%         4  _PyMem_DebugCheckAddress:python312_d.dll
  0.26%         2  read_size_t:python312_d.dll
  0.13%         1  _Py_DECREF_SPECIALIZED:python312_d.dll
  0.13%         1  k_mul:python312_d.dll
  0.13%         1  _PyErr_CheckSignalsTstate:python312_d.dll
  0.13%         1  write_size_t:python312_d.dll
  0.13%         1  _PyObject_Malloc:python312_d.dll
  0.13%         1  pymalloc_alloc:python312_d.dll
  0.13%         1  pymalloc_free:python312_d.dll
  0.13%         1  _PyObject_Init:python312_d.dll
  0.13%         1  _PyMem_DebugRawFree:python312_d.dll
  0.13%         1  _PyLong_New:python312_d.dll

        
    
Note

You can close the command line window with python_d.exe running when you have finished sampling. Sampling will also automatically end when the sample process has finished.

In the above example, you can see that the majority of code executed by CPython’s python_d.exe executable resides inside the python312_d.dll DLL.

Note that in sampling ....e.e.e.e.e. is a progressing printout where:

  • character ‘.’ represents a sample payload (of 128 samples) received from the WindowsPerf Kernel driver and
  • e’ represents an unsuccessful attempt to fetch the whole sample payload.
Note

You can also output wperf sample command in JSON format. Use the --json command line option to enable the JSON output. Use the -v command line option verbose to add more information about sampling.

Back
Next